SBIR-STTR Award

A Web-Enabled Database for Rapid Metagenomic Biocatalyst Discovery and Validation
Award last edited on: 1/11/18

Sponsored Program
SBIR
Awarding Agency
NIH : NIGMS
Total Award Amount
$1,507,951
Award Phase
2
Solicitation Topic Code
-----

Principal Investigator
Jeffrey Kim

Company Information

Radiant Genomics Inc

5980 Horton Street
Emeryville, CA 94608
   (415) 801-8073
   info@zymergen.com
   www.radiantgenomics.com
Location: Single
Congr. District: 13
County: Alameda

Phase I

Contract Number: 1R43GM113357-01
Start Date: 12/1/14    Completed: 5/31/15
Phase I year
2015
Phase I Amount
$216,675
Enzyme biocatalysis plays an important role in pharmaceutical synthesis as it can afford precise control over stereochemistry while achieving many conversions under physiological conditions. These benefits can both lower manufacturing costs and reduce environmental impact by eliminating solvent and heavy-metal catalyst steps in a synthetic strategy. Some recent examples of this include the enzyme-catalyzed synthesis of the antidiabetic medication sitagliptin, and the biomanufacturing of the antimalarial medication, artemisinin. Efficient enzyme discovery, engineering, and validation methods are required as the foundation of these efforts. As organisms occupy diverse environments, enzymes involved in identical conversions often possess dramatically different catalytic capacity and can harbor auxiliary domains which are critical for performance. As a result, it is has been routinely shown that broadening search efforts by drawing from diverse source organisms is the most efficient strategy for exploring catalytic fitness landscapes. Due to the cost and time associated with accessing, through DNA synthesis, large numbers of enzyme variants, researchers typically focus on testing a subselection of variants which have been previously characterized or those that are easy to obtain. This approach is fundamentally limited as: 1) the overwhelming majority of available biocatalysts have not been studied in detail due to cultivation bottlenecks and 2) there are no robust methods of predicting enzyme activity from primary sequence. To address this limitation, we propose to create a platform in which all computationally-identified enzyme variants in a database search can be immediately isolated, engineered, and delivered to end users through a publicly available, bioinformatics and LIMS-driven automation platform. Our strategy is ~1,000x less expensive while being up to 300x faster than current DNA synthesis theoretical limits. The platform is built upon a massive metagenomic library, the largest reported in the world, which overcomes the primary limitation of studying enzymes from cultivated sources. This collection contains several orders of magnitude greater enzyme diversity than can be found in culture collections. To populate the database, we will apply our patented high-efficiency sequencing method, to our metagenomic library, generating an unprecedented data set containing an N50 of >50kb. End-users will be able to search and identify enzyme variants in this dataset and both native sequences and combinatorial libraries of variants can be retrieved, produced, and automatically delivered to end users in less than a week and for far less cost than DNA synthesis. Overall, this system comprises an end-to-end, rapid biocatalyst discovery, engineering, and delivery system that will be a powerful resource for end users in basic research and industrial biotechnology.

Public Health Relevance Statement:


Public Health Relevance:
The antimalarial medication, artemisinin, and the enzyme-catalyzed synthesis of the antidiabetic medication sitagliptin are two prominent successes of bio-based manufacturing of therapeutics. In order for biomanufacturing to succeed consistently, enzymes in the synthesis pathway must be extremely fast and efficient in order to produce the target molecules at cost-effective yields, yet the available choices for many of these pathway enzymes are very limited and slow. We propose to sequence the largest and most diverse metagenomic library reported to date and build a web-accessible platform that will provide researchers access to orders of magnitude more enzyme diversity to choose from when building out biomanufacturing pathways.

Project Terms:
Acoustics; Address; Antidiabetic Drugs; Antimalarials; artemisinine; Artemisinins; Automation; base; Basic Science; Bioinformatics; Biological Assay; Biomanufacturing; Biotechnology; catalyst; Cloud Computing; Collaborations; Collection; combinatorial; Complex; Computer Analysis; Computer software; Cosmids; cost; cost effective; Data; Data Set; Databases; design; DNA; DNA biosynthesis; Engineering; Environment; Environmental Impact; enzyme activity; enzyme pathway; Enzymes; fitness; Foundations; Gene Cluster; Gene Order; gene synthesis; Genes; Genome; Genomics; Goals; Heavy Metals; Homologous Gene; Informatics; Internet; Legal patent; Length; Libraries; Liquid substance; Metadata; metagenome; metagenomic sequencing; Metagenomics; Methods; model organisms databases; Open Reading Frames; Organism; Pathway interactions; Performance; Pharmaceutical Preparations; Pharmacologic Substance; Physiological; Play; Positioning Attribute; Process; Protein Family; public health relevance; Reporting; Research Personnel; research study; Resources; Role; Solvents; Source; stereochemistry; success; System; Testing; Therapeutic; Time; Validation; Variant; web services; web-accessible; web-enabled; Writing

Phase II

Contract Number: 2R44GM113357-02
Start Date: 12/1/14    Completed: 3/31/18
Phase II year
2016
(last award dollars: 2017)
Phase II Amount
$1,291,276

Radiant Genomics proposes to develop an integrated enzyme discovery service, the Enzyme Variant Engine (EVE), built upon the largest cloned metagenomic sequence collection reported to date. The goal is to combine a publicly­accessible search engine, richly­annotated sequence database, arrayed sample library, and LIMS automation platform to deliver novel enzyme variants to end­users for lower cost, in less time, and from a greater pool of biodiversity than alternative options, such as DNA synthesis. Importantly, this service overcomes a major bottleneck in enzyme discovery that has traditionally focused on easily­cultivated organisms which are now known to represent less than 1% of biodiversity. Phase I research and development milestones were met or exceeded. In particular, we successfully demonstrated a high­efficiency sequencing workflow that will allow us to sequence and assemble our clone library, which is predicted to encode ~600M genes, >99% of which are derived from uncultivated and essentially unstudied organisms. We next demonstrated a combinatorial barcoding strategy that yields assemblies with an average length of >30 kilobases, a dramatic improvement in metagenomic contiguity. This feature enables the discovery of clusters of functionally related genes, such as those that encode complex natural products and nutrient fixation. These services were successfully integrated into an online search engine and e­commerce platform available at www.eve.bio. Finally, we developed and demonstrated infrastructure for an automated LIMS gene recovery system that can recover thousands of genes of interest from our arrayed library per week. The success of Phase I research was complemented by general improvements in sequencing cost­efficiency and cloud­computing. The EVE service has gained commercial traction and we believe further development will benefit basic research while positively impacting a broad range of biomanufacturing processes. Based on customer feedback, the aims of this proposal are 1) continued sequencing of the library using contiguity­preserving strategies 2) scaling of computational infrastructure 3) development of advanced enzyme selectors and 4) third­party database integration. The overall outcome of this program will be a centralized search engine which allows end­users to rapidly select and receive genes identified in bioinformatic analyses. These genes will be accessible for lower cost, in less time, and from a greater pool of genetic diversity than existing services. Overall, we believe that our platform will improve our understanding of sequence­to­function relationships and annotation for metagenomic environments, helping to bridge the gap between in silico and biochemical characterization from unexplored pools of genetic diversity.

Public Health Relevance Statement:


Public Health Relevance:
Bio­based manufacturing is poised to become a major economic driver due to advances in genetic and metabolic engineering. Most enzymes involved in biomanufacturing, however, are derived from the less than 1% of total biodiversity that is easily cultivated. To address this, we aim to make genes encoded in our metagenomic clone library, the largest and most diverse reported to date, accessible through a public search engine and delivery service, providing researchers with access to orders of magnitude more enzyme diversity, for lower cost than competing methods.

Project Terms:
Acoustics; Address; Advanced Development; Algorithms; Apache Indians; Automation; base; Basic Science; Biochemical; Biodiversity; Bioinformatics; Biological Assay; Biomanufacturing; Characteristics; Cloud Computing; Cluster Analysis; Collection; combinatorial; Communities; Complement; Complex; computer infrastructure; Computer Simulation; cost; Data; Data Set; Databases; Development; distributed data; DNA; DNA biosynthesis; e-commerce; Economics; Environment; Enzymes; Feedback; Gene Delivery; gene synthesis; Genes; Genetic Engineering; Genomics; Goals; Guanine + Cytosine Composition; Imagery; improved; interest; Laboratories; Length; Libraries; Liquid substance; meetings; metabolic engineering; Metadata; metagenome; metagenomic sequencing; Metagenomics; Methods; Microfluidics; Mining; Molecular Biology; nanolitre; Natural Products; novel; Nucleotides; Nutrient; Organism; Outcome; Performance; Phase; Preparation; Principal Component Analysis; Process; programs; Property; Proteins; Protocols documentation; public health relevance; Recovery; Reporting; Research; research and development; Research Infrastructure; Research Personnel; sample fixation; Sampling; Services; Site; Source; Structure; success; System; Technology; thermostability; Time; tool; Traction; Validation; Variant; Variation (Genetics); web-enabled