Knowledge-based Extensions of Protein-Protein Docking
Award last edited on: 11/25/2019

Sponsored Program
Awarding Agency
Total Award Amount
Award Phase
Solicitation Topic Code
Principal Investigator
David R Hall

Company Information

Acpharis Inc

160 North Mill Street
Holliston, MA 01746
   (508) 893-0667
Location: Single
Congr. District: 02
County: Middlesex

Phase I

Contract Number: ----------
Start Date: ----    Completed: ----
Phase I year
Phase I Amount
Protein–protein interactions are involved at multiple points in nearly all biological pathways. Since determining the structure of protein complexes by X-ray crystallography is expensive and slow, it is important to develop computational docking methods that, starting from the structures of component proteins or homology models, can determine the structure of their complexes. Accordingly, there is increasing demand for protein docking methods in the pharmaceutical industry. Acpharis licensed the docking program PIPER developed at Boston University. PIPER is the engine of the automated protein docking server ClusPro, which has been the best in the server category at CAPRI (Critical Assessment of Predicted Interactions), the ongoing worldwide protein docking competition since 2004. The best human predictor groups, however, frequently outperform the servers, primarily because the teams can use all information available in the literature as well as in protein sequence, structure, and interaction databases. In contrast, the current version of PIPER performs strict “ab initio” direct docking in isolation, thus without using this large body of knowledge. The general goal of this proposal is to develop algorithms and software to convert PIPER into an expert system that can take advantage of all information related to the proteins to be docked in an automated fashion. It is clear that additional information improves performance, and we expect that the extended version of PIPER will be competitive with the human predictors, even in the hands of users without computational biology experience. To achieve this goal we will first mine the databases to identify potential orthologs. If appropriate structural templates for the target complex are found, we will use template-based modeling. However, templates are available only for a small fraction of complexes, and hence in most cases we need direct docking of the component proteins, or their models if only the sequence is available. The proteins will be decomposed into globular domains and short linear motifs (SLIMs). Domain-domain interactions will be modeled by determining the structurally conserved portions of domains that participate in the actual interactions. These regions can be reliably modeled and will be docked, with the remainder of the proteins being modeled in the presence of the complex. We will also dock interacting domains of the orthologs, and use the consensus of results to improve the selection of the best model among the different docked structures. Domain-SLIM interactions will be treated separately using a novel peptide docking algorithm that accounts for their flexibility. We propose a protocol where the potential conformations of a peptide that includes a specific sequence motif are extracted from the Protein Data Bank and docked using PIPER. It was shown that clustering the resulting structures and selecting the largest clusters of low energy conformations provides acceptable docking results even without assuming any information on the potential binding site. With the above knowledge-based extensions, PIPER will be able to model a much larger set of interactions more reliably.

Public Health Relevance Statement:
Protein-protein interactions are involved in nearly all biological pathways and are increasingly a target for pharmaceutical companies either through disruption by small molecules or modulation by biologics. Having a 3D-structure showing a complex provides insight into understanding these interactions, but can be slow or expensive, leading many to turn to computational prediction of the structure. The general goal of this Phase I SBIR proposal is to incorporate the increasing amount of existing sequence and structural information into the predictive computational modeling of novel protein-protein complexes. !

Project Terms:
Algorithmic Software; Algorithms; Amino Acid Sequence; Back; base; Binding; Binding Sites; Bioinformatics; Biological; Biotechnology; Boston; Categories; commercialization; Complex; Computational Biology; Computer Simulation; Computer software; Consensus; crosslink; Data; Databases; Development; Docking; Drug Industry; experience; Expert Systems; flexibility; globular protein; Goals; Homologous Protein; Homology Modeling; Human; improved; insight; Knowledge; knowledge base; Letters; Licensing; Literature; Mediating; Methods; model building; Modeling; Molecular Conformation; novel; Orthologous Gene; Pathway interactions; Pattern; peptide structure; Peptides; Performance; Pharmacologic Substance; Phase; preference; programs; protein complex; protein protein interaction; protein structure; Proteins; Protocols documentation; Rest; restraint; Small Business Innovation Research Grant; small molecule; software development; Structure; Tertiary Protein Structure; Testing; three dimensional structure; Universities; Variant; X-Ray Crystallography

Phase II

Contract Number: ----------
Start Date: ----    Completed: ----
Phase II year
Phase II Amount