SBIR-STTR Award

A software tool to facilitate variable-level equivalency and harmonization in research data: Leveraging the NIH Common Data Elements Repository to link concepts and measures in an open format
Award last edited on: 2/8/2024

Sponsored Program
SBIR
Awarding Agency
NIH : NIA
Total Award Amount
$275,541
Award Phase
1
Solicitation Topic Code
866
Principal Investigator
Dan Smith

Company Information

Algenta Technologies LLC (AKA: Dragonmount Networks)

1428 Washington Avenue South Suite 203
Minneapolis, MN 55454
   (608) 213-1637
   jeremy@algenta.com
   www.algenta.com
Location: Single
Congr. District: 05
County: Hennepin

Phase I

Contract Number: 2023
Start Date: ----    Completed: 9/18/2023
Phase I year
2023
Phase I Amount
$275,541
The National Institute on Aging (NIA) supports numerous studies and archives that collect and disseminate critical data about the aging population of the United States. By supporting the collection and dissemination of longitudinal and multidisciplinary data, the NIA provides researchers the opportunity to measure change and stability in individuals over time, as well as to investigate aging phenomena from an integrated theoretical perspective. In both cases, equivalent or related variables must first be linked or merged before producing appropriately documented data products for eventual harmonization and analysis. The current aging research data environment provides many opportunities for linking similar topical datasets and harmonizing extant common variables, but few software tools are available to facilitate this resource-intensive task. The proposed project will demonstrate the feasibility of a guided harmonization software prototype by concording variables from three nationally representative NIA-funded studies (MIDUS, NHATS, NSHAP) and mapping them against extant data element concept sources such as the NIH Common Data Elements library to identify equivalent concepts and variables. The software prototype will use machine learning and advanced text analysis algorithms to guide the creation of concorded databases (variable crosswalks) that support harmonization and discoverability, both within and across aging-related statistical datasets. Additionally, the prototype will use an open-standards metadata framework to produce richly-described concordance databases that are interoperable, citable and FAIR. Colectica has a track record of creating open- standards based software tools that reduce data management burden by automatically extracting structured metadata from macro-level (study) and micro-level (variable) characteristics of aging studies. Specifically, the prototype will evaluate the feasibility of human-in-the-loop algorithms to operate as a "recommendation engine" to guide the concordance of potentially equivalent or similar variables among multiple datasets. The core hypothesis posits that the prototype will significantly decrease the labor, time, and resources required to create accurate and standardized concorded databases. To test this hypothesis, the research team will: construct and evaluate recommendation algorithms for variable concordance (Aim 1); establish metrics for measuring the accuracy and effectiveness of concordance (Aim 2); and create a user interface to test the recommendation engine, its functions, and associated inputs and outputs (Aim 3).

Public Health Relevance Statement:
Relevance to Public Health The current research data environment provides many opportunities for linking similar topical datasets and harmonizing extant common variables, but few software tools are available to facilitate this resource-intensive task. This project proposes using an open-standards framework to assemble richly-described datasets that are mapped against the NIH Common Data Elements (CDE) library to identify equivalent concepts and variables. Machine learning will guide data managers through this process and produce variable crosswalks that can aid harmonization and discoverability both within and across studies and datasets.

Project Terms:
Aging; Algorithms; Archives; Data Sources; Environment; Libraries; Manuals; Maps; United States National Institutes of Health; NIH; National Institutes of Health; Pain; Painful; Public Health; Questionnaires; Recommendation; Research; Research Personnel; Investigators; Researchers; Resources; Research Resources; Computer software; Software; Software Tools; Computer Software Tools; software toolkit; Standardization; Technology; Testing; Time; United States; Measures; Artifacts; Morphologic artifacts; Data Set; Phase; Link; Individual; Data Files; Data Bases; data base; Databases; Funding; tool; Machine Learning; machine based learning; Source; data management; Structure; Modeling; repository; depository; Documentation; Meta-Analysis; Effectiveness; Data; Data Element; Collection; Common Data Element; Characteristics; Process; Text; Development; developmental; Metadata; meta data; Output; National Institute on Aging; National Institute of Aging; cost; Data Coordination Center; data management and coordinating center; data management center; Data Coordinating Center; Outcome; interoperability; multidisciplinary; tool development; prototype; aged population; population aging; aging population; data sharing; data reduction; Algorithmic Analyses; Analyses of Algorithms; Analysis of Algorithms; Algorithmic Analysis; human-in-the-loop; FAIR data; FAIR guiding principles; Findable, Accessible, Interoperable and Re-usable; Findable, Accessible, Interoperable, and Reusable; FAIR principles; individual heterogeneity; individual variability; individual variation; AD related dementia; ADRD; Alzheimer's and related dementias; Alzheimer's disease and related dementia; Alzheimer's disease and related disorders; Alzheimer's disease or a related dementia; Alzheimer's disease or a related disorder; Alzheimer's disease or related dementia; Alzheimer's disease related dementia; multiple data sets; multiple datasets; harmonized data; data harmonization; metadata standards; artificial intelligence algorithm; AI algorithm

Phase II

Contract Number: 1R43AG085861-01
Start Date: 8/31/2024    Completed: 00/00/00
Phase II year
----
Phase II Amount
----