SBIR-STTR Award

Integrated Software Application For Management Of Hepatitis C Virus Data
Award last edited on: 3/21/13

Sponsored Program
SBIR
Awarding Agency
NIH : NIAID
Total Award Amount
$1,581,433
Award Phase
2
Solicitation Topic Code
-----

Principal Investigator
Johanna C Craig

Company Information

Gataca LLC

180 Orchard Hill Lane
Newport, VA 24128
   (540) 544-3033
   research@gatacallc.com
   www.gatacallc.com
Location: Single
Congr. District: 09
County: Giles

Phase I

Contract Number: 1R43AI084307-01
Start Date: 6/19/09    Completed: 5/31/10
Phase I year
2009
Phase I Amount
$149,950
The hepatitis C virus (HCV) infects approximately 4 million people in the U.S. The high mutation rate of HCV results in vast numbers of new genetic sequences and associated biological data in the daily conduct of laboratory research and clinical trials with attendant serious data management problems. Investigators currently rely upon homespun databases, generic software products, and tools from public web repositories to sort, organize and analyze their genomic and biological data. These tools are not tailored to the HCV genome, and moving data from one program to the next is labor intensive and vulnerable to error. We are developing a desktop software product tailored for the rapid, efficient and flexible management of HCV data. The product consists of graphical-user interface (GUI) tools and a data-storage and retrieval system that are both designed specifically for HCV analysis. It also includes a commercial relational data base engine. The most notable technical innovation is our annotation tool which simplifies the capture, storage and management of crucial experimental data points, and brings these user defined data points (annotations) into the same searchable context as those that are inherently systemic and structured. Other innovations include our alignment, phylogenetics and mutation analysis tools that are specifically tailored to the mathematics of the HCV replication rate and its error-prone polymerase. In preliminary work we designed, built and successfully unit- tested a prototype software system. The software architecture for that product consists of 3 tiers; presentation (GUI), middleware (Domain), and a relational database management system (RDBMS). In Phase I we will develop an alignment tool which will be linked to the existent query tool and include a contig assembler (aim I) for analyzing complete and partial genomic sequences. We will also develop a phylogeny tool for assembling alignments into evolutionary trees that will color-code and time-stamp the input sequences (aim II), and a graphics tool that will present the raw electropherogram data (traces), and assemble line and bar graphs to plot up to two variables (aim III). In Phase II we will develop additional tools for mutation tracking, report generation and entropy measurement, and we will develop statistical routines and security and installation packages. Specific Aims: Aim I. Develop an alignment tool; Aim II Develop a phylogeny tool. Aim III. Develop a graphics tool; Aim IV. Unit test the software application. We are seeking funds to build our product, addressing a scientific need with a marketable bioinformatics approach. In this way, we will merge informatics with basic research for rapid discovery. We believe that our disease-specific software products will aid in the rapidly developing market of HCV research. The result will be software that greatly improves analysis capabilities and reduces data processing time. These goals fall well within the scope of the NIH to promote basic research in the field of bioinformatics and information sciences, and could lead to enormous public benefit.

Public Health Relevance:
The hepatitis C virus is difficult to study and not effectively treated with anti-viral drugs, with fewer than 50% responding favorably to the current therapies. Efficacious options are years away. A major problem that investigators face is the rapid mutation rate of the virus and the related difficult data management problems that result from this rapid mutation rate. We are developing a powerful software product that will make it easier for scientists to overcome these data management problems. Moreover, our design will streamline the serious bottleneck of data management, significantly compressing the time between data collection and cure discovery.

Public Health Relevance Statement:
Project Narrative The hepatitis C virus is difficult to study and not effectively treated with anti-viral drugs, with fewer than 50% responding favorably to the current therapies. Efficacious options are years away. A major problem that investigators face is the rapid mutation rate of the virus and the related difficult data management problems that result from this rapid mutation rate. We are developing a powerful software product that will make it easier for scientists to overcome these data management problems. Moreover, our design will streamline the serious bottleneck of data management, significantly compressing the time between data collection and cure discovery.

Project Terms:
AIDS Seroconversion; AIDS Seropositivity; Address; Anti-HIV Positivity; Architecture; Basic Research; Basic Science; Bio-Informatics; Bioinformatics; Biological; Carcinoma of the Liver Cells; Cause of Death; Chronic; Cirrhosis; Clinical; Clinical Trials; Clinical Trials, Unspecified; Code; Coding System; Color; Computer Programs; Computer software; Data; Data Banks; Data Bases; Data Collection; Data Storage and Retrieval; Databank, Electronic; Databanks; Database Management Systems; Database, Electronic; Databases; Development; Disease; Disorder; Drugs; Drugs, Nonproprietary; Engineering / Architecture; Entropy; Face; Fibrosis; Funding; Generations; Generic Drugs; Genetic; Genetic Alteration; Genetic Change; Genetic defect; Genome; Genomics; Goals; Graph; Graphical interface; HCC; HCV; HIV Antibody Positivity; HIV Positive; HIV Positivity; HIV Seroconversion; HIV Seropositivity; HTLV-III Seroconversion; HTLV-III Seropositivity; Hepatic Disorder; Hepatitis C virus; Hepatitus C; Hepatocellular Carcinoma; Hepatocellular cancer; Hepatoma; Infection; Informatics; Information Sciences; Internet; Investigators; Laboratory Research; Lead; Link; Liver diseases; Marketing; Mathematics; Measurement; Medication; Mutation; Mutation Analysis; NIH; National Institutes of Health; National Institutes of Health (U.S.); Pathology; Patients; Pb element; Pharmaceutic Preparations; Pharmaceutical Preparations; Phase; Phylogenetic Analysis; Phylogenetics; Phylogeny; Polymerase; Primary carcinoma of the liver cells; Programs (PT); Programs [Publication Type]; Reporting; Research; Research Personnel; Researchers; Scientist; Security; Software; Sorting - Cell Movement; Staging; Structure; System; System, LOINC Axis 4; Systems, Data Base Management; Testing; Time; Trees; United States National Institutes of Health; Viral; Virus; Virus Replication; Viruses, General; WWW; Work; antibody positive AIDS test; antigen positive AIDS test; clinical data repository; clinical data warehouse; clinical investigation; computer program/software; computerized data processing; data management; data processing; data repository; data retrieval; data storage; design; designing; disease/disorder; drug/agent; facial; falls; flexibility; generic; genome mutation; graphic user interface; graphical user interface; heavy metal Pb; heavy metal lead; hepatopathy; improved; innovate; innovation; innovative; liver disorder; middleware; programs; prototype; public health relevance; relational database; relational database management systems; repository; seropositive (AIDS test); signal processing; software systems; sorting; tool; virus multiplication; web; world wide web

Phase II

Contract Number: 2R44AI084307-02
Start Date: 6/19/09    Completed: 2/28/14
Phase II year
2012
(last award dollars: 2013)
Phase II Amount
$1,431,483

The hepatitis C virus (HCV) infects approximately 4 million people in the U.S, and 170 million people worldwide. The high mutation rate of HCV results in vast numbers of new genetic sequences and associated biological data in the daily conduct of laboratory research and clinical trials with attendant serious data management problems. Investigators currently rely upon in-house developed databases, generic software products, and tools from public web repositories to sort, organize and analyze their genomic and biological data. These tools are not tailored to the HCV genome, and moving data from one program to the next is labor intensive and vulnerable to error. We are developing a web-based (vs. local server only) software product tailored for the rapid, efficient and flexible management of HCV data. The product consists of graphical-user interface (GUI) tools and a data-storage and retrieval system that are both designed specifically for HCV analysis. It also includes a commercial relational data base engine. The most notable technical innovation is our annotation tool which simplifies the capture, storage and management of crucial experimental data points, and brings these user defined data points (annotations) into the same searchable context as those that are inherently systemic and structured. Other innovations include our alignment, phylogenetics and mutation analysis tools that are specifically tailored to the mathematics of the HCV replication rate and its error-prone polymerase. In preliminary and Phase I work we designed, built and successfully unit tested a prototype software system, consisting of 3 tiers;presentation (GUI), middleware (Domain), and a relational database management system (RDBMS), and including tools for conducting HCV- tailored alignments and contig assemblies that are linked to a highly flexible query tool, as well as tools for assembling and viewing phylogenic trees and for producing graphics tool that present the raw electropherogram data (traces), and assemble line and bar graphs to plot up to two variables. In this Phase II work, we will develop additional tools for mutation tracking, report generation and entropy measurement, and we will develop statistical routines and security and installation packages. Specific Aims: Aim I. Transition the software platform to cloud computing and hosting environment;Aim II. Develop a suite of tools for conducting full mutation analysis;Aim III. Develop statistical routines;Aim IV. Unit test the software application. We are seeking funds to build our product, addressing a scientific need with a marketable bioinformatics approach. In this way, we will merge informatics with basic research for rapid discovery. We believe that our disease-specific software products will aid in the rapidly developing market of HCV research. The result will be software that greatly improves analysis capabilities and reduces data processing time. These goals fall well within the scope of the NIH to promote basic research in the field of bioinformatics and information sciences, and could lead to enormous public benefit.

Public Health Relevance:
The hepatitis C virus (HCV) is difficult to study and not effectively treated with the current anti- viral drug combination. Effective treatment options are years away. A major problem that HCV investigators must contend with is the rapid mutation rate of the viral genes, which creates a need to test patients continuously and a massive data accumulation problem. We are developing a powerful, game-changing software application that will make it easier for scientists to overcome these problems and focus on treatment and cure discovery.