SBIR-STTR Award

A Framework And Decision Tool For Confidentiality Protection In Public Use Data
Award last edited on: 12/29/11

Sponsored Program
SBIR
Awarding Agency
NIH : NIMH
Total Award Amount
$849,474
Award Phase
2
Solicitation Topic Code
-----

Principal Investigator
James P Kelly

Company Information

OptTek Systems Inc (AKA: Optimization Technologies Inc)

2241 Seventeenth Street
Boulder, CO 80302
   (303) 447-3255
   optinfo@opttek.com
   www.opttek.com
Location: Single
Congr. District: 02
County: Boulder

Phase I

Contract Number: 1R43MH086138-01A1
Start Date: 00/00/00    Completed: 00/00/00
Phase I year
2008
Phase I Amount
$99,843
The objective of this project is the development of an innovative technique to avoid disclosure of confidential data in public use tabular data. Our proposed technique, called Optimal Data Switching (OS), overcomes the limitations and disadvantages found in currently deployed disclosure limitation methods. Statistical databases for public use pose a critical problem of identifying how to make the data available for analysis without disclosing information that would infringe on privacy, violate confidentiality, or endanger national security. Organizations in both the public and private sectors have a major stake in this confidentiality protection problem, given the fact that access to data is essential for advancing research and formulating policy. Yet, the possibility of extracting certain sensitive elements of information from the data can jeopardize the welfare of these organizations and potentially, in some instances, the welfare of the society in which they operate. The challenge is, therefore, to represent the data in a form that permits accurate analysis for supporting research, decision-making and policy initiatives, while preventing an unscrupulous or ill-intentioned party from exploiting the data for harmful consequences. Our goal is to build on the latest advances in optimization, to which the OptTek Systems, Inc. (OptTek) research team has made pioneering contributions, to provide a framework based on optimal data switching, enabling the Centers for Disease Control and Prevention (CDC) and other organizations to effectively meet the challenge of confidentiality protection. The framework we propose is structured to be easy to use in a wide array of application settings and diverse user environments, from client-server to web-based, regardless of whether the micro-data is continuous, ordinal, binary, or any combination of these types. The successful development of such a framework, and the computer-based method for implementing it, is badly needed and will be of value to many types of organizations, not only in the public sector but also in the private sector, for whom the incentive to publish data is both economic as well as scientific. Examples in the public sector are evident, where organizations like CDC and the U.S. Census Bureau exist for the purpose of collecting, analyzing and publishing data for analysis by other parties. Numerous examples are also encountered in the private sector, notably in banking and financial services, healthcare (including drug companies and medical research institutions), market research, oil exploration, computational biology, renewable and sustainable energy, retail sales, product development, and a wide variety of other areas.

Public Health Relevance:
In the process of accumulating and disseminating public health data for reporting purposes, various uses, and statistical analysis, we must guarantee that individual records describing each person or establishment are protected. Organizations in both the public and private sectors have a major stake in this confidentiality protection problem, given the fact that access to data is essential for advancing research and formulating policy. This project proposes the development of a robust methodology and practical framework to deliver an efficient and effective tool to protect the confidentiality in published tabular data.

Public Health Relevance:
This Public Health Relevance is not available.

Thesaurus Terms:
There Are No Thesaurus Terms On File For This Project.

Phase II

Contract Number: 2R44MH086138-02
Start Date: 4/1/10    Completed: 3/31/12
Phase II year
2010
(last award dollars: 2011)
Phase II Amount
$749,631

Statistical databases for public use pose a critical problem: how to make the data available for analysis without disclosing information that would infringe on privacy, violate confidentiality, or endanger national security. Organizations in the public and private sectors have a major stake in this confidentiality protection problem, given the fact that access to data is essential for advancing research and formulating policy. Yet, the possibility of extracting certain sensitive elements of information from the data can jeopardize the welfare of these organizations and potentially, the welfare of the society in which they operate. The challenge is, therefore, to represent the data in a form that permits accurate analysis for supporting research, decision-making and policy initiatives, while preventing an unscrupulous or ill- intentioned party from exploiting the data for harmful consequences. The objective of this project is to develop a practical, computer-based framework for assessing, measuring, and mitigating disclosure risk in public use data. Our proposed framework, called OptShield, overcomes the disadvantages found in currently deployed disclosure limitation methods. We achieve this by combining perturbation and suppression methods with optimal switching of sensitive records at the micro-data level, to produce a method that protects confidentiality while preserving data integrity. In Phase II we are proposing to continue algorithmic and software development to achieve the objective of a working prototype of the software and service. This software will serve as the core technology to provide an application for a broad market in which customers have a major stake in confidentiality protection. The application we ultimately plan to offer in Phase III will consist of a three-phased approach to the disclosure limitation problem: (1) Assess a user's qualitative and quantitative disclosure risks inherent in the organization's data publishing and sharing plans; (2) Measure the disclosure risks in a user's proposed data products; and (3) Protect the user's data by applying the appropriate disclosure limitation techniques.

Public Health Relevance:
Public health organizations that collect and share sensitive data are apprehensive about the risk of inadvertently disclosing confidential information, given the fact that access to their data is essential for advancing research and formulating policy. Yet, the possibility of extracting certain vulnerable elements of information from the data, even after personal identifiers have been removed, can jeopardize the welfare of these organizations and potentially the welfare of the society in which they operate. Within the US Department of Health and Human Services, for example, preserving the confidentiality of records in order to continue to elicit information from the American people and from health care providers is "a matter of primary concern" (CDC/NCHS confidentiality guide). OptTek Systems, Inc. (OptTek) is developing a comprehensive framework designed to help public health and other organizations to avoid the disclosure of confidential information in public-use data. The application consists of a three-phased approach to the disclosure limitation problem: (1) Assess a user's qualitative and quantitative disclosure risks; (2) Measure the disclosure risks in a user's proposed data publishing and sharing plans; and (3) Protect the user's data by applying the appropriate disclosure limitation techniques.

Thesaurus Terms:
Algorithms; American; Area; Arts; Binding; Binding (Molecular Function); Businesses; Cells; Computer Programs; Computer Software; Computers; Confidential Information; Confidentiality; Data; Data Banks; Data Bases; Data Quality; Data Set; Databank, Electronic; Databanks; Database, Electronic; Databases; Dataset; Decision Making; Department Of Health And Human Services; Department Of Health And Human Services (U.S.); Disadvantaged; Disclosure; Elements; Feedback; Fostering; Hhs; Health Care Providers; Health Personnel; Healthcare Providers; Healthcare Worker; Information Disclosure; Investigators; Lead; Manuals; Marketing; Measurement; Measures; Methods; Methods And Techniques; Methods, Other; Molecular Interaction; Nchs; Nimh; National Center For Health Statistics; National Center For Health Statistics (U.S.); National Institute Of Mental Health; National Institute Of Mental Health (U.S.); National Security; Nature; Pb Element; Phase; Policies; Privacy; Private Sector; Public Health; Publishing; R01 Mechanism; R01 Program; Rpg; Records; Research; Research Grants; Research Personnel; Research Project Grants; Research Projects; Research Projects, R-Series; Research Support; Researchers; Resort; Resource Sharing; Risk; Sbir; Sbirs (R43/44); Secure; Services; Simulate; Small Business Innovation Research; Small Business Innovation Research Grant; Social Welfare; Societies; Software; Strategic Planning; System; System, Loinc Axis 4; Techniques; Technology; Testing; Time; United States Department Of Health And Human Services; United States Dept. Of Health And Human Services; United States National Center For Health Statistics; United States National Institute Of Mental Health; Work; Writing; Base; Clinical Data Repository; Clinical Data Warehouse; Computer Program/Software; Data Integrity; Data Repository; Design; Designing; Develop Software; Developing Computer Software; Experience; Experiment; Experimental Research; Experimental Study; Health Care Personnel; Health Care Worker; Health Organization; Health Provider; Healthcare Personnel; Heavy Metal Pb; Heavy Metal Lead; Improved; Innovate; Innovation; Innovative; Interest; Medical Personnel; Prevent; Preventing; Privacy Of Information; Prototype; Public Health Medicine (Field); Public Health Relevance; Relational Database; Research Study; Sharing Data; Software Development; Tool; Treatment Provider; Usability; Welfare