US20070078871A1 - System for and method of de-identifying data - Google Patents

System for and method of de-identifying data Download PDF

Info

Publication number
US20070078871A1
US20070078871A1 US11/634,698 US63469806A US2007078871A1 US 20070078871 A1 US20070078871 A1 US 20070078871A1 US 63469806 A US63469806 A US 63469806A US 2007078871 A1 US2007078871 A1 US 2007078871A1
Authority
US
United States
Prior art keywords
transaction
individual
database
information
protected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/634,698
Inventor
Dane Iverson
Karen Davis
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US11/634,698 priority Critical patent/US20070078871A1/en
Publication of US20070078871A1 publication Critical patent/US20070078871A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification

Definitions

  • the present invention relates to data processing systems and, more particularly, to a system for and method of de-identifying data.
  • HIPAA Health Insurance Portability and Accountability Act
  • De-identified data is data, alone or in combination with other information, that cannot readily identify an individual.
  • a company may need to de-identify individually identifiable information so that the company may continue to perform research on the data and/or distribute the de-identified data to third parties.
  • de-identifying all individually identifiable information an individual's identity and personal information that may identify that individual will still be protected.
  • companies de-identify records by “stripping” out all individually identifiable information from those records.
  • the de-identified data may generally be used or disclosed for any purpose (e.g., research), as long as it is not re-identified.
  • the protected identifiable information is generally stored in a database administered by a company. These databases may be organized as sets of tables.
  • One or more tables may include all personal identifiable information related to an individual and include data elements, such as social security number, name, age, date of birth and address.
  • Another table(s) may include transaction information associated with transactions submitted by and for individuals and may include data elements, such as social security number, date of transaction, transaction code, amount and transaction ID.
  • the transaction ID may be unique for each transaction in the transaction table.
  • the individual information table may be a table located within a master database or as part of a separate database.
  • the transaction table may be a table located within a master database or as part of a separate database. Whether separate databases or specific types of tables within the same master database, at least one field is present to link the record to one or more other elements in the database, for example the social security number and, possibly the transaction ID, may be included in each table so that related information may be linked across tables or databases.
  • health care information databases may include personal identifiable information related to an individual, such as an individual information table.
  • An individual information table may include data elements, such as social security number, name, date of birth, address, member number, and Medicare status.
  • Another table(s) may include claim transaction information associated with health care claims (transactions) submitted by and for patients in the individual information table, such as a transaction table.
  • a transaction table may include data elements, such as social security number, date of service, diagnosis code, procedure code, billed amount and transaction ID. The transaction ID may be unique for each claim in the transaction table.
  • FIG. 1 shows a portion of an exemplary health care database schema 100 with individual information table 101 and transaction table 102 .
  • An individual in individual information table 101 may be linked to one or more transactions in transaction table 102 by the individual's social security number.
  • social security number 123-45-6789 is linked to three transactions (transaction ID nos. 4329, 2049 and 2002).
  • Role based security controls access to tables and/or data elements within tables by user.
  • Role based security also defines access levels for each database user located within database's security scheme. For example, user A may have a certain level of access authorization that enables user A to view all data elements and all tables of a particular database.
  • user B may have a limited level of access authorization that enables user B to access half of the tables and, of those tables user B may access, access is further limited to only 50% of the data elements within each table.
  • a method of de-identifying data wherein the data to be de-identified is stored in a transaction table containing transactions and a personal information table containing identifiable information.
  • the method includes the steps of generating a de-identification pointer associated with an individual in the personal information table, wherein the individual is associated with at least one transaction in the transaction table; creating a non-protected transaction table, wherein the non-protected transaction table includes a non-protected transaction reference and non-protected information associated with a transaction from the transactional table; and creating an index table including the identification and the non protected transaction reference.
  • the identification is advantageously unique and may also lack context to the individual.
  • the identification may be random or pseudo-random.
  • FIG. 1 shows a prior-art database storing individually identifiable information
  • FIG. 2 shows an embodiment of a system for use with methods and systems consistent with the present invention
  • FIG. 3 shows an exemplary de-identified scheme storing de-identified data for use with methods and systems consistent with the present invention
  • FIG. 4A is a block diagram showing additional detail of the de-identification computer according to an embodiment of the invention as depicted in FIG. 2 ;
  • FIG. 4B is a block diagram showing additional detail of the client computer according to an embodiment of the invention as depicted in FIG. 2 ;
  • FIG. 5 shows a flow chart representing one embodiment of the present invention.
  • FIG. 6 shows a second flow chart representing one embodiment of the present invention.
  • An embodiment consistent with the present invention de-identifies individually identifiable information. Individually identifiable information must be de-identified, in many cases, before a user may access the information. Identifiable data includes attributes that may positively identify an individual associated with the identifiable data. An embodiment consistent with the present invention, de-identifies identifiable data so that a user may access the data without identifying an individual associated with the data. De-identified data is data that is not identifiable as belonging to a particular individual. Identifiable data may be de-identified by removing data elements that could potentially identify the individual (e.g., name, telephone number, social security number, account numbers).
  • de-identification pointer To de-identify the identifiable data, methods and systems consistent with the present invention generate a random identification not derived from the identifiable information, known as a “de-identification pointer.” Each de-identification pointer is associated with an individual and the individual's personal identifiable information, but the pointer is not derived from the individual's personal identifiable information. The de-identification pointer enables a user to obtain de-identified data since the de-identification pointer is substituted for all personal identifiable information for the same individual.
  • the de-identification pointer and associated identifiable information may be stored in a secure table not accessible to users. Since a de-identification pointer assumes the role of the identifiable information, a user that requires access to de-identified data will not need to access any identifiable information stored in the secure table.
  • the de-identification pointer may also be stored in an index table.
  • the index table links the individual (using the de-identification pointer) and claims (transactions) associated with the individual.
  • a user that requires access to de-identified data may access the index table since the index table provides a link to transaction data without identifying the individual. That is, the index table enables a user to retrieve transaction and transaction information from a non-protected transaction table without identifying the individual associated with the transaction.
  • the present invention provides a number of benefits over traditional de-identification systems.
  • First, the present invention enables users to automatically access transaction information without identifying an individual associated with the transaction information.
  • Second, the present invention enables a collector of identifiable information (e.g., a health care provider or a health care payor) to de-identify the individually identifiable information and use such de-identified information for various purposes, such as research. For example, a user may “mine” the de-identified data for information (e.g., number of patients having diabetes or number of times a particular patient visited the hospital).
  • Third, the present invention effectively limits access to only non-protected information by running a de-identification process on the individually identifiable information. After the de-identification process has been executed on the individually identifiable information, all identifiable information will be de-identified.
  • the de-identification pointer, database views and role-based security together prohibit a user from accessing the individually identifiable information.
  • FIG. 2 depicts a data processing system 200 suitable for practicing methods and systems consistent with the present invention.
  • Data processing system 200 includes identifiable database 201 and de-identified database 203 both connected to transaction database 206 and de-identification computer 209 .
  • Client computer 210 may be connected to de-identified database 203 .
  • Identifiable database 201 may contain protected information, such as individually identifiable information associated with individuals in a secure cross reference table 202 . As such, only a user with a sufficient level of access may access information in identifiable database 201 , such as a database administrator administering the de-identification process.
  • De-identified database 203 may contain de-identified information based on the individually identifiable information in transactions database 206 .
  • the information in de-identified database 203 does not identify an individual.
  • a user may access de-identified database 203 to obtain information without identifying the individual.
  • de-identified database 203 may contain non-protected information, such as de-identified data associated with individuals in a de-identification index table 204 and non-protected transaction table 205 .
  • Transactions database 206 may contain a personal information table, such as a individual information table 207 and a secure table, such as a secure transaction table 208 . Since individual information table 207 and secure transactional table 208 contain protected information, these tables are generally inaccessible by an unauthorized user.
  • De-identification computer 209 may contain software to create secure cross-reference table 202 and de-identification index table 204 and non-protected transaction table 205 based on information stored in individual information table 207 and secure transactions table 208 in transactions database 206 .
  • a user may use software on client computer 210 to obtain de-identified data (e.g., by accessing de-identification index table 204 ) in de-identified database 203 .
  • client computer 210 may contain many more client computers and additional client sites.
  • client computer 210 may come with de-identified database 203 already installed.
  • FIG. 3 shows an exemplary secure cross-reference table 202 , de-identification index table 204 and non-protected transaction table 205 .
  • Secure cross reference table 202 may contain identifiable information, such as name, social security number and date of birth.
  • Secure cross reference table 202 may also contain de-identified data, such as a de-identification pointer.
  • De-identification index table 204 may contain de-identified data, such as a de-identification pointer and a transaction ID.
  • Non protected transaction table 205 may also contain de-identified data.
  • the de-identification pointer enables a user to obtain protected identifiable information without identifying the individual associated with the information. For example referring to FIG. 3 , de-identification pointer 123456 is linked to patient “J.
  • De-identification pointer 123456 is a random number not related to identifiable information, such as name, social security number, date of birth. A user may use de-identification pointer 123456 to access de-identified transaction information in the non-protected transaction table. By using the de-identification pointer and not identifiable information, the user will be unable to identify the individual associated with the de-identified data.
  • a user may access de-identified data by using de-identified index table 204 and non-protected transaction table 205 .
  • Methods and systems consistent with the present invention use role based security to ensure that the data does not become identifiable and so that the user may not access secure cross-reference table 202 .
  • FIG. 4A depicts a more detailed view of de-identification computer 209 , which contains memory 401 , secondary storage device 403 , central processing unit (CPU) 404 , video display 405 and input/output ( 10 ) device 406 .
  • Memory 401 stores de-identify software 402 that accesses identifiable database 201 , de-identified database 203 and transactions database 206 to create de-identification index table 204 and non-protected transaction table 205 .
  • FIG. 4B depicts a more detailed view of client computer 210 , which contains memory 407 , secondary storage device 409 , central processing unit (CPU) 410 , video display 411 and input/output ( 10 ) device 412 .
  • Memory 407 stores client software 408 that may access de-identified database 202 .
  • client software 408 may be the Business Intelligence Tools software, available from Sagent, Inc. or the MS Access Software, available from Microsoft.
  • FIG. 5 is a flow chart consistent with one embodiment of the present invention when de-identifying individually identifiable information.
  • the steps in FIG. 5 may be performed by an independent technical team with a sufficient level of access to identifiable database 201 and transaction database 206 .
  • de-identify software 402 initiates secure cross reference table 202 , de-identification index table 204 and non-protected transaction table 205 .
  • transaction database 206 may be made secure.
  • software 402 obtains a new record from individual information table 207 in transaction database 206 (step 502 ).
  • Software 402 may obtain the record by querying transaction database 206 for a next record.
  • transaction database 206 may store identifiable information and needs to be made secure so that the identifiable information within the database will be inaccessible by a user who must use de-identified data. Transaction database 206 is secured by changing the database views to control access to the identifiable information within the database.
  • de-identify software 402 may generate a random de-identification pointer not related to information in the associated record (step 504 ).
  • software 402 may use a “Random class” or “SecureRandom class” both available in the JAVA standard API. Both classes produce sequences of pseudorandom numbers based on a seed value. Since the Random and SecureRandom classes may generate a same random number more than one time, software 402 also verifies that each generated random number has not been used in secure cross-reference table 202 .
  • the de-identification pointer is an index key and, as such, the de-identification pointer may not be duplicated in secure cross-reference table 202 .
  • Each de-identification pointer generated by software 402 may be checked against all other de-identification pointers in secure cross-reference table 202 to ensure that the de-identification pointer is not duplicated.
  • Other methods may be used to generate the de-identification pointer, such as a shuffling algorithm.
  • software 402 incorporates the de-identification pointer into the record and inserts the record as a new record in secure cross-reference table 202 (step 505 ). For example, for the patient “J. Doe,” software 402 may generate “123456” as the de-identification pointer and insert J. Doe's identifiable information and newly created de-identification pointer into secure cross-reference table 202 .
  • the record may include the de-identification pointer and a transaction ID.
  • the transaction ID may be obtained from secure transaction table 208 .
  • Each record in de-identification index table 204 is created for each transaction an individual has submitted and is stored in secure transaction table 208 . Therefore, each time a new de-identification pointer is generated, software 402 searches secure transaction table 208 for all transactions associated with the individual associated with the de-identification pointer. For example, since J. Doe has had three transactions ( 4329 , 2049 and 2002 ), three records will be added to de-identification index table 204 .
  • the de-identification index table links the de-identification pointers to the corresponding transaction information in the non-protected transaction table.
  • software 402 adds a record to non-protected transaction table 205 (step 507 ).
  • the record may include the transaction ID and other information associated with the transaction obtained from secure transaction table 208 (e.g., date, procedure code, billing code and amount).
  • a user may access de-identified information stored in non-protected transaction table 205 .
  • software 402 may obtain the new record (step 502 ). Otherwise, a user may begin retrieving de-identified data by using client software 408 and accessing de-identified index table 204 and non-protected transaction table 205 .
  • FIG. 6 is a flow chart consistent with one embodiment of the present invention when retrieving de-identified data from non-protected transaction table 205 .
  • client software 408 may be initiated for example, by “double-clicking” on an icon (using a mouse) associated with software 408 or typing in the software name from a command line. Note that software 408 may be initiated using other methods, such as automatically executing the software during client computer 210 startup sequence.
  • a user may transmit search parameters to software 408 (step 602 ).
  • the search parameters enable a user to locate de-identified data in de-identified database 203 .
  • search parameters may be procedure code, date or a de-identification pointer.
  • software 408 searches de-identified database 203 (step 603 ).
  • Software 408 may search both de-identification index table 204 an non-protected transaction table 205 for any matches of the search parameters.
  • client software 408 may display the search results to the user in Step 604 .

Abstract

A method of de-identifying data, wherein the data to be de-identified is stored in a transaction table containing transactions and a personal information table containing identifiable information. The method includes the steps of generating a de-identification pointer associated with an individual in the personal information table, wherein the individual is associated with at least one transaction in the transaction table; creating a non-protected transaction table, wherein the non-protected transaction table includes a non-protected transaction reference and non-protected information associated with a transaction from the transactional table; and creating an index table including the identification and the non protected transaction reference. According to a preferred embodiment, the identification is advantageously unique and may also lack context to the individual. According to a further feature, the identification may be random or pseudo-random.

Description

    FIELD OF INVENTION
  • The present invention relates to data processing systems and, more particularly, to a system for and method of de-identifying data.
  • BACKGROUND
  • Privacy concerns among individuals and lawmakers have grown in recent years. It is desirable for companies that store records containing individually identifiable information to secure the information so that it is not readily available to those users who do need access to the information. For example, in 1996, Congress enacted the Health Insurance Portability and Accountability Act (HIPAA). HIPAA imposes strict privacy rules on the insurance and health care industries. In a broad sense, HIPAA protects a patient's privacy in his or her medical records and secures a patient's individual health care information.
  • In addition to securing identifiable information, companies still need to “de-identify” protected information received or created in the course of business. De-identified data is data, alone or in combination with other information, that cannot readily identify an individual. A company may need to de-identify individually identifiable information so that the company may continue to perform research on the data and/or distribute the de-identified data to third parties. By de-identifying all individually identifiable information, an individual's identity and personal information that may identify that individual will still be protected. Traditionally, companies de-identify records by “stripping” out all individually identifiable information from those records.
  • Once the identifiable information is de-identified, the de-identified data may generally be used or disclosed for any purpose (e.g., research), as long as it is not re-identified. The protected identifiable information is generally stored in a database administered by a company. These databases may be organized as sets of tables. One or more tables may include all personal identifiable information related to an individual and include data elements, such as social security number, name, age, date of birth and address. Another table(s) may include transaction information associated with transactions submitted by and for individuals and may include data elements, such as social security number, date of transaction, transaction code, amount and transaction ID. The transaction ID may be unique for each transaction in the transaction table.
  • Depending how a company has organized its identifiable information and its transaction information, the individual information table may be a table located within a master database or as part of a separate database. Similarly, the transaction table may be a table located within a master database or as part of a separate database. Whether separate databases or specific types of tables within the same master database, at least one field is present to link the record to one or more other elements in the database, for example the social security number and, possibly the transaction ID, may be included in each table so that related information may be linked across tables or databases.
  • For example, health care information databases may include personal identifiable information related to an individual, such as an individual information table. An individual information table may include data elements, such as social security number, name, date of birth, address, member number, and Medicare status. Another table(s) may include claim transaction information associated with health care claims (transactions) submitted by and for patients in the individual information table, such as a transaction table. A transaction table may include data elements, such as social security number, date of service, diagnosis code, procedure code, billed amount and transaction ID. The transaction ID may be unique for each claim in the transaction table.
  • FIG. 1 shows a portion of an exemplary health care database schema 100 with individual information table 101 and transaction table 102. An individual in individual information table 101 may be linked to one or more transactions in transaction table 102 by the individual's social security number. For example, social security number 123-45-6789 is linked to three transactions (transaction ID nos. 4329, 2049 and 2002).
  • To limit access to the databases and tables within a company's databases, a company (or database administrator) may use “role based security.” Commonly available in most major Data Base Management Systems (DBMS), role based security controls access to tables and/or data elements within tables by user. Role based security also defines access levels for each database user located within database's security scheme. For example, user A may have a certain level of access authorization that enables user A to view all data elements and all tables of a particular database. In contrast, user B may have a limited level of access authorization that enables user B to access half of the tables and, of those tables user B may access, access is further limited to only 50% of the data elements within each table.
  • As explained, most of the information that privacy regulations may mark as protected is individually identifiable information and may be used to identify an individual. Accordingly, there is a need to de-identify data, to make de-identified data available, and to protect individually identifiable information from uses that fall outside those permitted uses in various privacy regulations.
  • SUMMARY OF THE INVENTION
  • A method of de-identifying data, wherein the data to be de-identified is stored in a transaction table containing transactions and a personal information table containing identifiable information. The method includes the steps of generating a de-identification pointer associated with an individual in the personal information table, wherein the individual is associated with at least one transaction in the transaction table; creating a non-protected transaction table, wherein the non-protected transaction table includes a non-protected transaction reference and non-protected information associated with a transaction from the transactional table; and creating an index table including the identification and the non protected transaction reference. According to a preferred embodiment, the identification is advantageously unique and may also lack context to the individual. According to a further feature, the identification may be random or pseudo-random.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 shows a prior-art database storing individually identifiable information;
  • FIG. 2 shows an embodiment of a system for use with methods and systems consistent with the present invention;
  • FIG. 3 shows an exemplary de-identified scheme storing de-identified data for use with methods and systems consistent with the present invention;
  • FIG. 4A is a block diagram showing additional detail of the de-identification computer according to an embodiment of the invention as depicted in FIG. 2;
  • FIG. 4B is a block diagram showing additional detail of the client computer according to an embodiment of the invention as depicted in FIG. 2;
  • FIG. 5 shows a flow chart representing one embodiment of the present invention; and
  • FIG. 6 shows a second flow chart representing one embodiment of the present invention.
  • DETAILED DESCRIPTION
  • An embodiment consistent with the present invention de-identifies individually identifiable information. Individually identifiable information must be de-identified, in many cases, before a user may access the information. Identifiable data includes attributes that may positively identify an individual associated with the identifiable data. An embodiment consistent with the present invention, de-identifies identifiable data so that a user may access the data without identifying an individual associated with the data. De-identified data is data that is not identifiable as belonging to a particular individual. Identifiable data may be de-identified by removing data elements that could potentially identify the individual (e.g., name, telephone number, social security number, account numbers).
  • To de-identify the identifiable data, methods and systems consistent with the present invention generate a random identification not derived from the identifiable information, known as a “de-identification pointer.” Each de-identification pointer is associated with an individual and the individual's personal identifiable information, but the pointer is not derived from the individual's personal identifiable information. The de-identification pointer enables a user to obtain de-identified data since the de-identification pointer is substituted for all personal identifiable information for the same individual. The de-identification pointer and associated identifiable information may be stored in a secure table not accessible to users. Since a de-identification pointer assumes the role of the identifiable information, a user that requires access to de-identified data will not need to access any identifiable information stored in the secure table.
  • The de-identification pointer may also be stored in an index table. The index table links the individual (using the de-identification pointer) and claims (transactions) associated with the individual. A user that requires access to de-identified data may access the index table since the index table provides a link to transaction data without identifying the individual. That is, the index table enables a user to retrieve transaction and transaction information from a non-protected transaction table without identifying the individual associated with the transaction.
  • The present invention provides a number of benefits over traditional de-identification systems. First, the present invention enables users to automatically access transaction information without identifying an individual associated with the transaction information. Second, the present invention enables a collector of identifiable information (e.g., a health care provider or a health care payor) to de-identify the individually identifiable information and use such de-identified information for various purposes, such as research. For example, a user may “mine” the de-identified data for information (e.g., number of patients having diabetes or number of times a particular patient visited the hospital). Third, the present invention effectively limits access to only non-protected information by running a de-identification process on the individually identifiable information. After the de-identification process has been executed on the individually identifiable information, all identifiable information will be de-identified. The de-identification pointer, database views and role-based security together prohibit a user from accessing the individually identifiable information.
  • FIG. 2 depicts a data processing system 200 suitable for practicing methods and systems consistent with the present invention. Data processing system 200 includes identifiable database 201 and de-identified database 203 both connected to transaction database 206 and de-identification computer 209. Client computer 210 may be connected to de-identified database 203. Identifiable database 201 may contain protected information, such as individually identifiable information associated with individuals in a secure cross reference table 202. As such, only a user with a sufficient level of access may access information in identifiable database 201, such as a database administrator administering the de-identification process.
  • De-identified database 203 may contain de-identified information based on the individually identifiable information in transactions database 206. The information in de-identified database 203 does not identify an individual. As such, a user may access de-identified database 203 to obtain information without identifying the individual. For example, de-identified database 203 may contain non-protected information, such as de-identified data associated with individuals in a de-identification index table 204 and non-protected transaction table 205.
  • Transactions database 206 may contain a personal information table, such as a individual information table 207 and a secure table, such as a secure transaction table 208. Since individual information table 207 and secure transactional table 208 contain protected information, these tables are generally inaccessible by an unauthorized user.
  • De-identification computer 209 may contain software to create secure cross-reference table 202 and de-identification index table 204 and non-protected transaction table 205 based on information stored in individual information table 207 and secure transactions table 208 in transactions database 206. A user may use software on client computer 210 to obtain de-identified data (e.g., by accessing de-identification index table 204) in de-identified database 203.
  • Although only one client computer 210 is depicted, one skilled in the art will appreciate that data processing system 200 may contain many more client computers and additional client sites. One skilled in the art will also appreciate that client computer 210 may come with de-identified database 203 already installed.
  • FIG. 3 shows an exemplary secure cross-reference table 202, de-identification index table 204 and non-protected transaction table 205. Secure cross reference table 202 may contain identifiable information, such as name, social security number and date of birth. Secure cross reference table 202 may also contain de-identified data, such as a de-identification pointer. De-identification index table 204 may contain de-identified data, such as a de-identification pointer and a transaction ID. Non protected transaction table 205 may also contain de-identified data. As explained, the de-identification pointer enables a user to obtain protected identifiable information without identifying the individual associated with the information. For example referring to FIG. 3, de-identification pointer 123456 is linked to patient “J. Doe” and transaction ID 4329, 2049 and 2002. De-identification pointer 123456 is a random number not related to identifiable information, such as name, social security number, date of birth. A user may use de-identification pointer 123456 to access de-identified transaction information in the non-protected transaction table. By using the de-identification pointer and not identifiable information, the user will be unable to identify the individual associated with the de-identified data.
  • A user may access de-identified data by using de-identified index table 204 and non-protected transaction table 205. Methods and systems consistent with the present invention use role based security to ensure that the data does not become identifiable and so that the user may not access secure cross-reference table 202.
  • FIG. 4A depicts a more detailed view of de-identification computer 209, which contains memory 401, secondary storage device 403, central processing unit (CPU) 404, video display 405 and input/output (10) device 406. Memory 401 stores de-identify software 402 that accesses identifiable database 201, de-identified database 203 and transactions database 206 to create de-identification index table 204 and non-protected transaction table 205.
  • FIG. 4B depicts a more detailed view of client computer 210, which contains memory 407, secondary storage device 409, central processing unit (CPU) 410, video display 411 and input/output (10) device 412. Memory 407 stores client software 408 that may access de-identified database 202. An example of client software 408 may be the Business Intelligence Tools software, available from Sagent, Inc. or the MS Access Software, available from Microsoft.
  • FIG. 5 is a flow chart consistent with one embodiment of the present invention when de-identifying individually identifiable information. To protect identifiable information from being seen by a user, the steps in FIG. 5 may be performed by an independent technical team with a sufficient level of access to identifiable database 201 and transaction database 206. In Step 501, de-identify software 402 initiates secure cross reference table 202, de-identification index table 204 and non-protected transaction table 205. By implementing database views, transaction database 206 may be made secure. Once the tables are initiated, software 402 obtains a new record from individual information table 207 in transaction database 206 (step 502). Software 402 may obtain the record by querying transaction database 206 for a next record. As explained, transaction database 206 may store identifiable information and needs to be made secure so that the identifiable information within the database will be inaccessible by a user who must use de-identified data. Transaction database 206 is secured by changing the database views to control access to the identifiable information within the database.
  • Next if the record is not already stored in secure cross-reference table 202 (step 503), de-identify software 402 may generate a random de-identification pointer not related to information in the associated record (step 504). For example, software 402 may use a “Random class” or “SecureRandom class” both available in the JAVA standard API. Both classes produce sequences of pseudorandom numbers based on a seed value. Since the Random and SecureRandom classes may generate a same random number more than one time, software 402 also verifies that each generated random number has not been used in secure cross-reference table 202. The de-identification pointer is an index key and, as such, the de-identification pointer may not be duplicated in secure cross-reference table 202. Each de-identification pointer generated by software 402 may be checked against all other de-identification pointers in secure cross-reference table 202 to ensure that the de-identification pointer is not duplicated. One skilled in the art will appreciate that other methods may be used to generate the de-identification pointer, such as a shuffling algorithm.
  • Once the de-identification pointer is generated, software 402 incorporates the de-identification pointer into the record and inserts the record as a new record in secure cross-reference table 202 (step 505). For example, for the patient “J. Doe,” software 402 may generate “123456” as the de-identification pointer and insert J. Doe's identifiable information and newly created de-identification pointer into secure cross-reference table 202.
  • Next, software 402 adds a record to de-identification index table 204 (step 506). The record may include the de-identification pointer and a transaction ID. The transaction ID may be obtained from secure transaction table 208. Each record in de-identification index table 204 is created for each transaction an individual has submitted and is stored in secure transaction table 208. Therefore, each time a new de-identification pointer is generated, software 402 searches secure transaction table 208 for all transactions associated with the individual associated with the de-identification pointer. For example, since J. Doe has had three transactions (4329, 2049 and 2002), three records will be added to de-identification index table 204. As explained, the de-identification index table links the de-identification pointers to the corresponding transaction information in the non-protected transaction table.
  • Next, software 402 adds a record to non-protected transaction table 205 (step 507). For example, the record may include the transaction ID and other information associated with the transaction obtained from secure transaction table 208 (e.g., date, procedure code, billing code and amount). A user may access de-identified information stored in non-protected transaction table 205. Finally, if there is another record in individual information table 207 (step 508), software 402 may obtain the new record (step 502). Otherwise, a user may begin retrieving de-identified data by using client software 408 and accessing de-identified index table 204 and non-protected transaction table 205.
  • FIG. 6 is a flow chart consistent with one embodiment of the present invention when retrieving de-identified data from non-protected transaction table 205. In Step 601, client software 408 may be initiated for example, by “double-clicking” on an icon (using a mouse) associated with software 408 or typing in the software name from a command line. Note that software 408 may be initiated using other methods, such as automatically executing the software during client computer 210 startup sequence.
  • Once initiated, a user may transmit search parameters to software 408 (step 602). The search parameters enable a user to locate de-identified data in de-identified database 203. For example, search parameters may be procedure code, date or a de-identification pointer. Once software 408 receives the search parameters, software 408 searches de-identified database 203 (step 603). Software 408 may search both de-identification index table 204 an non-protected transaction table 205 for any matches of the search parameters. Once the search is completed, client software 408 may display the search results to the user in Step 604.

Claims (6)

1-6. (canceled)
7. A method of retrieving de-identified data from a non-protected transaction table, executed in a data processing system, comprising:
generating a de-identification pointer that substitutes data identifying an individual;
creating the non-protected transaction table, wherein the non-protected transaction table includes de-identified transactional data corresponding to the individual, the de-identified transactional data not being capable of identifying the individual;
creating an index table including the de-identification pointer and at least a portion of the non-protected transaction table associated with the individual;
receiving search parameters, wherein the search parameters are used to locate de-identified data in the non protected transaction table;
locating at least one record that matches the search parameters; and
transmitting the located records to a user.
8. The method of claim 7, wherein locating at least one record comprises searching the non-protected transactional table for the at least one record.
9. A data processing system for de-identifying data comprising:
an identifiable database containing protected personal identifiable information, wherein the personal identifiable information does identify an individual;
a de-identified database containing non-protected transaction information and a de-identification pointer, wherein the de-identification pointer and the non-protected transaction information do not identify an individual;
a transaction database containing transactions, wherein the transactions do identify an individual;
an index table comprising the de-identification pointer and at least a portion of the de-identified database associated with the individual; and
a de-identification computer, wherein the de-identification computer creates the de-identification pointer in the de-identified database based on information in the identifiable database and the transaction database.
10. The data processing system of claim 9, further comprising means for securing the transaction database by implementing a database view to control access to the transaction database.
11-17. (canceled)
US11/634,698 2002-05-22 2006-12-06 System for and method of de-identifying data Abandoned US20070078871A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/634,698 US20070078871A1 (en) 2002-05-22 2006-12-06 System for and method of de-identifying data

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10/151,974 US7158979B2 (en) 2002-05-22 2002-05-22 System and method of de-identifying data
US11/634,698 US20070078871A1 (en) 2002-05-22 2006-12-06 System for and method of de-identifying data

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US10/151,974 Continuation US7158979B2 (en) 2002-05-22 2002-05-22 System and method of de-identifying data

Publications (1)

Publication Number Publication Date
US20070078871A1 true US20070078871A1 (en) 2007-04-05

Family

ID=29548424

Family Applications (2)

Application Number Title Priority Date Filing Date
US10/151,974 Expired - Lifetime US7158979B2 (en) 2002-05-22 2002-05-22 System and method of de-identifying data
US11/634,698 Abandoned US20070078871A1 (en) 2002-05-22 2006-12-06 System for and method of de-identifying data

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US10/151,974 Expired - Lifetime US7158979B2 (en) 2002-05-22 2002-05-22 System and method of de-identifying data

Country Status (1)

Country Link
US (2) US7158979B2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110225009A1 (en) * 2010-03-12 2011-09-15 Kress Andrew E System and method for providing geographic prescription data
US8862999B2 (en) 2010-11-22 2014-10-14 International Business Machines Corporation Dynamic de-identification of data
US9323948B2 (en) 2010-12-14 2016-04-26 International Business Machines Corporation De-identification of data
US11055431B2 (en) * 2017-12-15 2021-07-06 Blackberry Limited Securing data storage of personally identifiable information in a database
US11626191B1 (en) 2022-09-05 2023-04-11 Affirmativ Diagnostics PLLC Secure and efficient laboratory diagnosis and reporting

Families Citing this family (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8166381B2 (en) * 2000-12-20 2012-04-24 Heart Imaging Technologies, Llc Medical image management system
US6934698B2 (en) * 2000-12-20 2005-08-23 Heart Imaging Technologies Llc Medical image management system
US20040064341A1 (en) * 2002-09-27 2004-04-01 Langan Pete F. Systems and methods for healthcare risk solutions
US7519591B2 (en) * 2003-03-12 2009-04-14 Siemens Medical Solutions Usa, Inc. Systems and methods for encryption-based de-identification of protected health information
KR100552692B1 (en) * 2003-10-02 2006-02-20 삼성전자주식회사 Medical data sharing system for securing personal information and for supporting medical research and medical data sharing method thereby
US8447738B1 (en) 2003-11-17 2013-05-21 Medco Health Solutions, Inc. Computer system and method for de-identification of patient and/or individual health and/or medical related information, such as patient micro-data
US20050125254A1 (en) * 2003-12-03 2005-06-09 Roy Schoenberg Key maintenance method and system
US7555493B2 (en) * 2004-03-08 2009-06-30 Transreplicator, Inc. Apparatus, systems and methods for relational database replication and proprietary data transformation
EP1637954A1 (en) * 2004-09-15 2006-03-22 Ubs Ag Generation of anonymized data sets from productive applications
US20060074897A1 (en) * 2004-10-04 2006-04-06 Fergusson Iain W System and method for dynamic data masking
US7502741B2 (en) * 2005-02-23 2009-03-10 Multimodal Technologies, Inc. Audio signal de-identification
US7913900B2 (en) * 2005-05-31 2011-03-29 Catalina Marketing Corporation System of performing a retrospective drug profile review of de-identified patients
US7309001B2 (en) * 2005-05-31 2007-12-18 Catalina Marketing Corporation System to provide specific messages to patients
US8180653B2 (en) * 2006-01-18 2012-05-15 Catalina Marketing Corporation Pharmacy network computer system and printer
US7849030B2 (en) 2006-05-31 2010-12-07 Hartford Fire Insurance Company Method and system for classifying documents
US7974942B2 (en) * 2006-09-08 2011-07-05 Camouflage Software Inc. Data masking system and method
US8204832B2 (en) * 2006-10-27 2012-06-19 Hitachi Medical Corporation Medical image diagnostic apparatus and remote maintenance system
US10664815B2 (en) * 2007-09-17 2020-05-26 Catalina Marketing Corporation Secure customer relationship marketing system and method
US8401183B2 (en) * 2007-12-27 2013-03-19 Verizon Patent And Licensing Inc. Method and system for keying and securely storing data
US8055668B2 (en) * 2008-02-13 2011-11-08 Camouflage Software, Inc. Method and system for masking data in a consistent manner across multiple data sources
US20090287502A1 (en) * 2008-05-15 2009-11-19 Catalina Marketing Corporation E-PatientLink
US20090307240A1 (en) * 2008-06-06 2009-12-10 International Business Machines Corporation Method and system for generating analogous fictional data from non-fictional data
US8069053B2 (en) * 2008-08-13 2011-11-29 Hartford Fire Insurance Company Systems and methods for de-identification of personal data
CH702260B1 (en) * 2008-09-08 2014-06-30 Credit Suisse Securities Usa Llc Environmental developing device.
US8478765B2 (en) 2008-12-29 2013-07-02 Plutopian Corporation Method and system for compiling a multi-source database of composite investor-specific data records with no disclosure of investor identity
US8386502B2 (en) * 2009-03-12 2013-02-26 Bank Of America Corporation Market identification system
CA2690788C (en) * 2009-06-25 2018-04-24 University Of Ottawa System and method for optimizing the de-identification of datasets
US9946810B1 (en) 2010-04-21 2018-04-17 Stan Trepetin Mathematical method for performing homomorphic operations
US20110264631A1 (en) * 2010-04-21 2011-10-27 Dataguise Inc. Method and system for de-identification of data
US8626749B1 (en) * 2010-04-21 2014-01-07 Stan Trepetin System and method of analyzing encrypted data in a database in near real-time
US8930381B2 (en) 2011-04-07 2015-01-06 Infosys Limited Methods and systems for runtime data anonymization
US8799022B1 (en) 2011-05-04 2014-08-05 Strat ID GIC, Inc. Method and network for secure transactions
US8943079B2 (en) * 2012-02-01 2015-01-27 Telefonaktiebolaget L M Ericsson (Publ) Apparatus and methods for anonymizing a data set
JP6098294B2 (en) * 2013-03-28 2017-03-22 富士通株式会社 Information concealment device and information concealment method
WO2015008480A1 (en) * 2013-07-17 2015-01-22 日本電気株式会社 Information processing device that performs anonymization, and anonymization method
WO2015073260A1 (en) 2013-11-14 2015-05-21 3M Innovative Properties Company Obfuscating data using obfuscation table
EP3069248A4 (en) * 2013-11-14 2017-06-21 3M Innovative Properties Company Systems and methods for obfuscating data using dictionary
US10049185B2 (en) 2014-01-28 2018-08-14 3M Innovative Properties Company Perfoming analytics on protected health information
US10803466B2 (en) 2014-01-28 2020-10-13 3M Innovative Properties Company Analytic modeling of protected health information
US11030587B2 (en) * 2014-04-30 2021-06-08 Mastercard International Incorporated Systems and methods for providing anonymized transaction data to third-parties
ES2734058T3 (en) * 2015-12-30 2019-12-04 Legalxtract Aps A method and system to provide a document extract
US10691407B2 (en) 2016-12-14 2020-06-23 Kyruus, Inc. Methods and systems for analyzing speech during a call and automatically modifying, during the call, a call center referral interface
US10592693B2 (en) 2017-01-12 2020-03-17 Ca, Inc. System and method for analyzing cooperative synthetic identities
US10460129B2 (en) 2017-01-12 2019-10-29 Ca, Inc. System and method for managing cooperative synthetic identities for privacy protection through identity obfuscation and synthesis
US10481998B2 (en) 2018-03-15 2019-11-19 Microsoft Technology Licensing, Llc Protecting sensitive information in time travel trace debugging
US10762240B2 (en) * 2018-08-22 2020-09-01 International Business Machines Corporation Anonymizing a file for diagnosis
US10942909B2 (en) * 2018-09-25 2021-03-09 Salesforce.Com, Inc. Efficient production and consumption for data changes in a database under high concurrency
WO2021041746A1 (en) * 2019-08-27 2021-03-04 Mshift, Inc. Stable digital token processing and encryption on blockchain

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010054155A1 (en) * 1999-12-21 2001-12-20 Thomas Hagan Privacy and security method and system for a World-Wide-Web site
US20020002550A1 (en) * 2000-02-10 2002-01-03 Berman Andrew P. Process for enabling flexible and fast content-based retrieval
US20020073099A1 (en) * 2000-12-08 2002-06-13 Gilbert Eric S. De-identification and linkage of data records
US6418441B1 (en) * 1998-03-27 2002-07-09 Charles G. Call Methods and apparatus for disseminating product information via the internet using universal product codes

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6418441B1 (en) * 1998-03-27 2002-07-09 Charles G. Call Methods and apparatus for disseminating product information via the internet using universal product codes
US20010054155A1 (en) * 1999-12-21 2001-12-20 Thomas Hagan Privacy and security method and system for a World-Wide-Web site
US20020002550A1 (en) * 2000-02-10 2002-01-03 Berman Andrew P. Process for enabling flexible and fast content-based retrieval
US20020073099A1 (en) * 2000-12-08 2002-06-13 Gilbert Eric S. De-identification and linkage of data records

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110225009A1 (en) * 2010-03-12 2011-09-15 Kress Andrew E System and method for providing geographic prescription data
US8862999B2 (en) 2010-11-22 2014-10-14 International Business Machines Corporation Dynamic de-identification of data
US8881019B2 (en) 2010-11-22 2014-11-04 International Business Machines Corporation Dynamic de-identification of data
US9323948B2 (en) 2010-12-14 2016-04-26 International Business Machines Corporation De-identification of data
US9323949B2 (en) 2010-12-14 2016-04-26 International Business Machines Corporation De-identification of data
US11055431B2 (en) * 2017-12-15 2021-07-06 Blackberry Limited Securing data storage of personally identifiable information in a database
US11626191B1 (en) 2022-09-05 2023-04-11 Affirmativ Diagnostics PLLC Secure and efficient laboratory diagnosis and reporting

Also Published As

Publication number Publication date
US20030220927A1 (en) 2003-11-27
US7158979B2 (en) 2007-01-02

Similar Documents

Publication Publication Date Title
US7158979B2 (en) System and method of de-identifying data
US8566113B2 (en) Methods, systems and computer program products for providing a level of anonymity to patient records/information
Agrawal et al. Securing electronic health records without impeding the flow of information
US7725479B2 (en) Unique person registry
US6763344B1 (en) Method of and system for dynamically controlling access to data records
US6449621B1 (en) Privacy data escrow system and method
JP4378288B2 (en) How to achieve security for your data
US6874085B1 (en) Medical records data security system
US20070192139A1 (en) Systems and methods for patient re-identification
KR102442737B1 (en) Computer-implemented system and method for anonymizing encrypted data
US20050256740A1 (en) Data record matching algorithms for longitudinal patient level databases
US20110082794A1 (en) Client-centric e-health system and method with applications to long-term health and community care consumers, insurers, and regulators
US20080120296A1 (en) Systems and methods for free text searching of electronic medical record data
US20080162402A1 (en) Techniques for establishing and enforcing row level database security
US8498884B2 (en) Encrypted portable electronic medical record system
US20020123909A1 (en) Consumer electronic medical record file sharing system (CEMRFS)
JP2005100408A (en) System and method for storage, investigation and retrieval of clinical information, and business method
US20050209884A1 (en) Method, system and computer program product for providing medical information
US20040030579A1 (en) Method, system and computer program product for providing medical information
US20050228817A1 (en) Method, system, and software for electronic data capture and data analysis of clinical databases
Siegenthaler et al. Privacy enforcement for distributed healthcare queries
Sliwa et al. A web architecture based on physical data separation supporting privacy protection in medical research
Bhattacharya et al. Protecting privacy of health information through privacy broker
Wang et al. Decision Support for Patient Consent Management
Langberg et al. Incorporating privacy support into clinical data warehouses

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION