SBIR-STTR Award

From Lab to Algorithm: Cloud-based Biological Data Preparation, Tracking, and Checking for AI-readiness
Award last edited on: 9/5/22

Sponsored Program
SBIR
Awarding Agency
DOE
Total Award Amount
$249,878
Award Phase
1
Solicitation Topic Code
C53-01a
Principal Investigator
Anastasia Deckard

Company Information

Geometric Data Analytics

636 Rock Creek Road
Chapel Hill, NC 27514
   (919) 448-7871
   N/A
   www.geomdata.com
Location: Single
Congr. District: 04
County: Orange

Phase I

Contract Number: DE-SC0022400
Start Date: 2/14/22    Completed: 2/13/23
Phase I year
2022
Phase I Amount
$249,878
For large scale analysis of biological systems, moving from data to analysis to interpretable results is generally very slow and failures are not discovered until the end of the process. Complicated, evolving data is run through a variety of changing analysis scripts, without standardization or provenance tracking, which causes reproducibility issues. These issues have negative impact on the quality, speed, and cost of biological data analysis projects. We propose a software system to address these issues consistently, flexibly, and proactively that will produce analysis-ready or AI/ML-ready biological data and metadata. To make analysis faster and easier, we propose a cloud-based, microservice architecture that provides services and pipelines for users. To reduce and find data issues, there will be services for standardizing data, geometrical/statistical analysis, and data summarization tools. To increase the reproducibility of results, we propose a system of tracking data provenance, algorithm version tracking, and data identifiers. In Phase 1 we propose to build the cloud-based system of microservices that are executed by configurable pipelines and track provenance and versioning. We plan to provide a set of initial services that include standardizing data, metadata, and QC/QA data; identifying potential data issues; quantifying performance issues; and locating the sources of issues. We will begin working with two data types, transcriptomics and proteomics, while keeping the system flexible to add more types later. This system would be used by medium to large corporations and government research labs that work in applied areas such as biomedical, pharmaceutical, or agbio. These researchers are working on topics such as diagnostics, synthetic biology, and drug development and are processing genomics, transcriptomics, and proteomics data. Decreasing the time from experiment to results creates an economic benefit for both companies and the public. For researchers, they spend less and get results faster, which saves them both money and time. These savings can also benefit the public, as they receive new products more quickly. Detecting issues in the data increases the correctness of the company's results, improves the quality of the product for consumers, and improves consumers’ confidence in the company

Phase II

Contract Number: ----------
Start Date: 00/00/00    Completed: 00/00/00
Phase II year
----
Phase II Amount
----