SBIR-STTR Award

Exact Statistical Tools for Genetic Association Studies
Award last edited on: 12/29/14

Sponsored Program
SBIR
Awarding Agency
NIH : NHGRI
Total Award Amount
$1,108,108
Award Phase
2
Solicitation Topic Code
-----

Principal Investigator
Pralay Senchaudhuri

Company Information

Cytel Software Corporation (AKA: Cytel, Inc)

675 Massachusetts Avenue #3
Cambridge, MA 02139
   (617) 661-2011
   info@cytel.com
   www.cytel.com
Location: Single
Congr. District: 05
County: Middlesex

Phase I

Contract Number: 1R43HG004027-01A1
Start Date: 6/1/10    Completed: 11/30/10
Phase I year
2010
Phase I Amount
$112,075
The overall goal of our research is to develop and extend powerful exact statistical tools for testing genetic association, and to incorporate these methods into two existing, widely used software packages (Cytel Studio, SAS) that will serve the needs of data analysts in pharmaceuticals, genetic epidemiology and public health, and other fields which require a greater understanding of the genetic determinants of complex disease. The demand for these analytic tools is rising dramatically, as rapid progress in genotyping technology is making it easier and less costly to measure sampled subjects for ever larger numbers of genetic markers. Genetic association represents an observed correlation between an investigative genetic marker and some physical trait, and can be assessed using either traditional case-control or family-based study designs. In either case, there are compelling applications of permutation or exact statistical approaches that are computationally challenging, yet are simply unavailable in currently used software or are implemented in a manner that requires excessive memory or computation. The computational innovations developed for this project will fill this gap, significantly improving the efficiency and power of existing tools used for genetic association under both family-based and case-control designs. During Phase I, we will build a prototype computer program that includes (i) exact family-based tests for both biallelic and multiallelic markers, and (ii) a permutation procedure that simultaneously tests genetic association assuming various modes of inheritance (i.e., recessive, dominant, additive, or codominant). We will also investigate the feasibility of incorporating these procedures into a SAS PROC, complementing and extending currently implemented SAS JMP Genomics procedures for testing genetic association. As a part of Phase II, we will integrate our Phase I tools into Cytel's StatXact system and into the SAS JMP Genomics system as an external procedure. We will additionally (i) extend the exact family-based procedures to accommodate haplotype data, (ii) develop and implement algorithms for permutation approaches to large-scale screening experiments, (iii) incorporate exact versions of basic genetic epidemiologic procedures, and (iv) incorporate efficient Monte Carlo sampling tools to extend the usefulness of the exact procedures to larger data sets.

Public Health Relevance:
Rapid progress in genotyping technology is making it easier and less costly to identify increasingly large numbers of genetic markers from sampled humans. These markers can be used to identify new genes potentially associated with many complex diseases. This project will provide genetics researchers with more accurate and efficient statistical tools for analyzing data from these studies.

Thesaurus Terms:
Accounting; Algorithms; Alleles; Allelomorphs; Analysis, Data; Articulation; Complement; Complement Proteins; Complex; Computer Programs; Computer Programs And Programming; Computer Software; Dna; Data; Data Analyses; Data Banks; Data Bases; Data Set; Databank, Electronic; Databanks; Database, Electronic; Databases; Dataset; Deoxyribonucleic Acid; Development And Research; Disease; Disorder; Epidemiology; Equilibrium; Family; Gwas; Genes; Genetic; Genetic Determinism; Genetic Heterogeneity; Genetic Markers; Genetic Screening; Genetic Analyses; Genomics; Goals; Haplotypes; Human; Human, General; Investigators; Joints; Knowledge; Link; Location; Man (Taxonomy); Man, Modern; Measurement; Measures; Memory; Methods; Network-Based; Pedigree; Pharmaceutical Agent; Pharmaceuticals; Pharmacologic Substance; Pharmacological Substance; Phase; Polymorphism, Single Base; Population; Procedures; Public Health; R & D; R&D; Research; Research Design; Research Personnel; Researchers; Snp; Snps; Subgp; Sampling; Screening Procedure; Single Nucleotide Polymorphism; Software; Statistical Methods; Study Type; Subgroup; System; System, Loinc Axis 4; Testing; Association Test; Balance; Balance Function; Base; Case Control; Clinical Data Repository; Clinical Data Warehouse; Computer Program; Computer Program/Software; Computer Programming; Data Repository; Design; Designing; Disease/Disorder; Experience; Experiment; Experimental Research; Experimental Study; Genetic Analysis; Genetic Association; Genetic Determinant; Genetic Epidemiology; Genetic Pedigree; Genome Wide Association Scan; Genome Wide Association Studies; Genome Wide Association Study; Genome-Wide Scan; Genomewide Association Scan; Genomewide Association Studies; Genomewide Association Study; Genomewide Scan; Genotyping Technology; Improved; Innovate; Innovation; Innovative; Pedigree Structure; Prototype; Public Health Medicine (Field); Public Health Relevance; Relational Database; Research And Development; Research Study; Screening; Screenings; Study Design; Tool; Trait; User Friendly Computer Software; User Friendly Software; Whole Genome Association Studies; Whole Genome Association Study

Phase II

Contract Number: 2R44HG004027-02
Start Date: 6/1/10    Completed: 12/31/14
Phase II year
2013
(last award dollars: 2014)
Phase II Amount
$996,033

The overall goal of our research is to develop and extend efficient exact statistical tools for testing genetic association, and to incorporate these methods into existing, widely used software packages that will serve the needs of data analysts in pharmaceuticals, epidemiology, public health, and other fields seeking to better understand the genetic causes of complex disease. The demand in this research area for greater statistical and computational innovation is rising dramatically, as rapid progress in genotyping technology is making it easier and less costly to measure sampled subjects for ever-larger numbers of genetic markers. Such investigative markers now predominantly include individual base pair mutations (referred to as single nucleotide polymorphisms or SNPs) along strands of cellular DNA. Marker panels of 1-2M SNPs are now common for genome-wide studies, and developing technologies (such as exome or whole-genome sequencing) will allow routine comparisons over marker sets that are orders of magnitude larger. With so many hypothesis tests, the need to preserve the rate of false positive findings presents some critical statistical and computational difficulties. Existing methods and their implementations often perform poorly under common conditions. The procedures developed during both phases of our project will significantly improve the efficiency, accuracy, and statistical power of genetic association tests, both for current GWAS panels as well as for next-generation technologies that are yielding even greater volumes of data. This project represents the joint efforts of investigators who are at the forefron of methodological research into genetic association, and software developers who have extensive experience in making cutting-edge exact statistical methods available in user-friendly software. In this project, we will extend the work begun during Phase 1 by (1) implementing a battery of exact multiple testing procedures for genetic association studies with case-control data, and making their performance significantly more efficient by using a parallel processing approach; (2) developing and implementing new multiple testing procedures for family-based association studies; (3) providing a framework that will allow our parallel processing programs to be as widely compatible as possible with modern personal computing hardware; and (4) incorporating the procedures additionally within a SAS PROC, and developing an interface that will allow users to access R functions and objects while using StatXact.

Public Health Relevance:
Studies of complex disease and genetics now commonly use thousands or even millions of different genetic markers. Conventional statistical analyses in such studies can suffer from a variety of challenges, connected primarily to controlling the rate of false positive findings when carrying out so many individual hypothesis tests. We propose to develop commercial software with computationally more efficient and robust procedures for modern genetic association studies.

Public Health Relevance Statement:
Studies of complex disease and genetics now commonly use thousands or even millions of different genetic markers. Conventional statistical analyses in such studies can suffer from a variety of challenges, connected primarily to controlling the rate of false positive findings when carrying out so many individual hypothesis tests. We propose to develop commercial software with computationally more efficient and robust procedures for modern genetic association studies.

Project Terms:
Accounting; Algorithms; Alleles; Area; base; Base Pairing; case control; Complex; Computer software; conditioning; Conservatism; Data; design; Disease; DNA; exome; experience; Family; Gene Frequency; Genetic; genetic analysis; genetic association; Genetic Markers; genome sequencing; genome wide association study; genotyping technology; Goals; Hereditary Disease; Hybrids; improved; Individual; innovation; Joints; Measures; Memory; Methods; Mutation; Network-based; next generation; parallel processing; Performance; Pharmacoepidemiology; Phase; platform-independent; Procedures; programs; public health medicine (field); Relative (related person); Research; research and development; Research Design; Research Personnel; Sampling; Single Nucleotide Polymorphism; Statistical Methods; System; Technology; Testing; tool; trait; user friendly software; Work