SBIR-STTR Award

Whole Genome Sequencing - Data to Insight in One Hour
Award last edited on: 11/5/2018

Sponsored Program
SBIR
Awarding Agency
NSF
Total Award Amount
$973,104
Award Phase
2
Solicitation Topic Code
BT
Principal Investigator
Mehrzad Samadiarakhshbahar

Company Information

ParaBricks LLC (AKA: ParaBricks Inc)

2985 Hickory Lane
Ann Arbor, MI 48104
   (734) 355-3594
   info@parabricks.com
   www.parabricks.com
Location: Single
Congr. District: 12
County: Washtenaw

Phase I

Contract Number: 1647990
Start Date: 12/15/2016    Completed: 5/31/2017
Phase I year
2016
Phase I Amount
$225,000
The broader impact/commercial potential of this Small Business Innovation Research (SBIR) project will be to provide deep insights into the DNA of patients in one hour at one-fourth the cost. This will allow hospitals, clinics and research centers to delve faster into the genetic information of the patients and return essential insights to physicians, leading to faster decisions on therapy. Analyzing DNA data holds the promise of detecting several diseases and can also help in pinpointing their genetic origins, which will be key for treatment of vulnerable cases such as newborn babies, people with rare diseases, and pregnant women. By providing the analysis of whole DNA data in one hour as compared to several days, DNA tests can become mainstream, thereby reducing anxiety among patients and their relatives. As the number of patients for which deep DNA analysis will be required is doubling every year, this project aims to meet the exploding demands of large scale computational genomics of the future and enable deep DNA analysis for all patients.This SBIR Phase I project proposes to use the power of state of the art cloud computing platforms to provide analysis for Whole Genome Sequencing (WGS) data in one hour. Several key researchers have shown that data from WGS is a critical requirement for accurate insights and detailed analysis of underlying diseases for various diseases including leukemia, breast Cancer, ADHD, Alzheimer's, congenital heart disease, HIV susceptibility, as well as others as information in the non-coding region is required. However, the computational analysis for WGS data takes several days and will be the major bottleneck for utilizing key WGS data to personalize the treatment for the affected patient. This project aims to use several high performance computing techniques on the cloud that will be tailored for NGS analyses and can accelerate the process by more than 40 times. This project uses a disruptive technology that breaks algorithms to work independently on nodes on the cloud and the team has created a collection of software optimizations to improve the utilization of cloud resources. This toolbox of optimizations is being applied to commonly used software tools in computational genomics for faster analysis.

Phase II

Contract Number: 1758644
Start Date: 9/15/2018    Completed: 8/31/2020
Phase II year
2018
Phase II Amount
$748,104
The broader impact/commercial potential of this Small Business Innovation Research (SBIR) Phase II project is to enhance precision medicine by reducing data processing time and cost, leading to faster results, higher genomic data processing throughput and, ultimately leading to better treatment of patients. Many diseases have their impacts locked in DNA data. Unlocking the data for an individual may provide a significant boost to human healthcare. By providing Whole Genome Sequencing (WGS) data analysis in one hour, as compared to the current standard of several days, and at a fraction of today's cost, DNA tests can become mainstream and research can be accelerated. WGS data is becoming a key component in ensuring correct and timely treatment of vulnerable populations, including newborn babies, people with cancer and rare diseases, and pregnant women. Correct prediction and treatment based on WGS data will extend human lives and promote better quality of life by accelerating the development and broad implementation of precision medicine. The intellectual merit of the proposed activity is to develop a cloud-based framework to accelerate whole genome sequencing data analysis. This SBIR Phase II project creates a software framework where multiple industry-standard genomic analyses will be accelerated transparently to run orders of magnitude faster, while reducing computational costs by up to 4x. The proposed innovation solves the scaling, performance, and cost challenges of WGS computing through novel parallelism extraction and mapping techniques. The framework under development for accelerating genomic analysis has been implemented to focus on GPUs and traditional processors (CPUs). Using this framework, secondary data analysis programs can use all the computing resources present in a standard computing node (CPU cores; GPUs), leading to higher utilization of the system and higher throughput. The GPU accelerates the data parallel portion of the software, while the CPU is mainly responsible for the orchestration of data between CPUs and GPUs, load balancing across multiple accelerators. By providing this framework to run on the cloud and on-premise, users can scale to meet the exploding demands of the DNA analysis industry. The software generates exact results to industry standard tools and does not sacrifice configurability that is critical for users. This framework also will enable researchers and medical professionals to analyze several thousands of genomes simultaneously, leading to higher quality results. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.