SBIR-STTR Award

Highly resource-efficient protein engineering using machine learning
Award last edited on: 12/16/21

Sponsored Program
SBIR
Awarding Agency
NSF
Total Award Amount
$256,000
Award Phase
1
Solicitation Topic Code
BT
Principal Investigator
Surojit (Surge) Biswas

Company Information

Nabla Bio Inc

127 Western Avenue
Boston, MA 02134
   (919) 757-1609
   N/A
   www.nabla.bio
Location: Single
Congr. District: 07
County: Suffolk

Phase I

Contract Number: 2051603
Start Date: 4/1/21    Completed: 11/30/21
Phase I year
2021
Phase I Amount
$256,000
The broader impact/commercial potential of this Small Business Innovation Research (SBIR) Phase I project is to improve, accelerate, and alleviate costs of protein engineering across diverse industries including industrial biocatalysts, biomanufacturing, food technology, and therapeutics. Today, late-stage protein engineering represents a major time, labor, and financial bottleneck. Since real-world translation is the focus of late-stage development, assays are more reflective of their end-use application and therefore necessarily require more time, labor, and capital. This precludes many variants from being screened at this stage. Failure at these late stages of development is costly, and often results from a change in environmental parameters from test conditions in early high throughput screens. Accurate prediction of protein variants based on minimal data but with high likelihood of function under end-use conditions is a critical unmet need.The proposed project will demonstrate the feasibility of leveraging a machine learning model, trained on raw protein sequences, mutagenesis datasets and natural sequence- function pairs, to predict highly functional variants of a protein of interest (POI) without sequence-function datasets specific to the selected POI and application. Such an approach, known as zero-shot learning, has not been applied to protein engineering to date. To achieve this, a large-scale language model will be trained with almost 5 billion curated unlabeled protein sequences from public and private databases and a collection of mutagenesis datasets. This general knowledge model can then be fused with an application-specific top model derived from natural sequences (distinct from the POI) paired with parameters of their natural environments. This training is hypothesized to imbue the model with a notion of which sequence features improve protein function in a general sense, and under particular environmental conditions (e.g., high temperature, high salinity, etc.). To demonstrate the feasibility and utility of this approach, the model will be used in virtual directed evolution experiments to optimize two therapeutically relevant enzymes, optimized for function in non-native environments, and assessed for this function in vitro.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Phase II

Contract Number: ----------
Start Date: 00/00/00    Completed: 00/00/00
Phase II year
----
Phase II Amount
----