The overall goal of our research is to develop and extend powerful exact statistical tools for testing genetic association, and to incorporate these methods into two existing, widely used software packages (Cytel Studio, SAS) that will serve the needs of data analysts in pharmaceuticals, genetic epidemiology and public health, and other fields which require a greater understanding of the genetic determinants of complex disease. The demand for these analytic tools is rising dramatically, as rapid progress in genotyping technology is making it easier and less costly to measure sampled subjects for ever larger numbers of genetic markers. Genetic association represents an observed correlation between an investigative genetic marker and some physical trait, and can be assessed using either traditional case-control or family-based study designs. In either case, there are compelling applications of permutation or exact statistical approaches that are computationally challenging, yet are simply unavailable in currently used software or are implemented in a manner that requires excessive memory or computation. The computational innovations developed for this project will fill this gap, significantly improving the efficiency and power of existing tools used for genetic association under both family-based and case-control designs. During Phase I, we will build a prototype computer program that includes (i) exact family-based tests for both biallelic and multiallelic markers, and (ii) a permutation procedure that simultaneously tests genetic association assuming various modes of inheritance (i.e., recessive, dominant, additive, or codominant). We will also investigate the feasibility of incorporating these procedures into a SAS PROC, complementing and extending currently implemented SAS JMP Genomics procedures for testing genetic association. As a part of Phase II, we will integrate our Phase I tools into Cytel's StatXact system and into the SAS JMP Genomics system as an external procedure. We will additionally (i) extend the exact family-based procedures to accommodate haplotype data, (ii) develop and implement algorithms for permutation approaches to large-scale screening experiments, (iii) incorporate exact versions of basic genetic epidemiologic procedures, and (iv) incorporate efficient Monte Carlo sampling tools to extend the usefulness of the exact procedures to larger data sets.
Public Health Relevance: Rapid progress in genotyping technology is making it easier and less costly to identify increasingly large numbers of genetic markers from sampled humans. These markers can be used to identify new genes potentially associated with many complex diseases. This project will provide genetics researchers with more accurate and efficient statistical tools for analyzing data from these studies.
Thesaurus Terms: Accounting; Algorithms; Alleles; Allelomorphs; Analysis, Data; Articulation; Complement; Complement Proteins; Complex; Computer Programs; Computer Programs And Programming; Computer Software; Dna; Data; Data Analyses; Data Banks; Data Bases; Data Set; Databank, Electronic; Databanks; Database, Electronic; Databases; Dataset; Deoxyribonucleic Acid; Development And Research; Disease; Disorder; Epidemiology; Equilibrium; Family; Gwas; Genes; Genetic; Genetic Determinism; Genetic Heterogeneity; Genetic Markers; Genetic Screening; Genetic Analyses; Genomics; Goals; Haplotypes; Human; Human, General; Investigators; Joints; Knowledge; Link; Location; Man (Taxonomy); Man, Modern; Measurement; Measures; Memory; Methods; Network-Based; Pedigree; Pharmaceutical Agent; Pharmaceuticals; Pharmacologic Substance; Pharmacological Substance; Phase; Polymorphism, Single Base; Population; Procedures; Public Health; R & D; R&D; Research; Research Design; Research Personnel; Researchers; Snp; Snps; Subgp; Sampling; Screening Procedure; Single Nucleotide Polymorphism; Software; Statistical Methods; Study Type; Subgroup; System; System, Loinc Axis 4; Testing; Association Test; Balance; Balance Function; Base; Case Control; Clinical Data Repository; Clinical Data Warehouse; Computer Program; Computer Program/Software; Computer Programming; Data Repository; Design; Designing; Disease/Disorder; Experience; Experiment; Experimental Research; Experimental Study; Genetic Analysis; Genetic Association; Genetic Determinant; Genetic Epidemiology; Genetic Pedigree; Genome Wide Association Scan; Genome Wide Association Studies; Genome Wide Association Study; Genome-Wide Scan; Genomewide Association Scan; Genomewide Association Studies; Genomewide Association Study; Genomewide Scan; Genotyping Technology; Improved; Innovate; Innovation; Innovative; Pedigree Structure; Prototype; Public Health Medicine (Field); Public Health Relevance; Relational Database; Research And Development; Research Study; Screening; Screenings; Study Design; Tool; Trait; User Friendly Computer Software; User Friendly Software; Whole Genome Association Studies; Whole Genome Association Study