SBIR-STTR Award

Information Extraction for New Emerging Noisy User-generated Micro-Text
Award last edited on: 9/19/21

Sponsored Program
STTR
Awarding Agency
DOD : AF
Total Award Amount
$900,000
Award Phase
2
Solicitation Topic Code
AF19B-T006
Principal Investigator
Steven Minton

Company Information

Inferlink Corporation

2361 Rosecrans Avenue Suite 348
El Segundo, CA 90245
   (310) 341-2446
   inquiry@inferlink.com
   www.inferlink.com

Research Institution

University of Southern California

Phase I

Contract Number: FA8750-20-C-0204
Start Date: 1/23/20    Completed: 1/23/21
Phase I year
2020
Phase I Amount
$150,000
Neural networks have proved highly effective at extracting information from text. However, noisy microtext has proved to be particularly difficult because low-level syntactic cues much less useful. In this project, we propose to explore ways of incorporating strong semantic, expectation-based models into a neural net architecture to improve performance on microtext extraction. In phase I, we will consider a strong semantic model that uses an massive knowledge graph to generate interest profiles from a user's tweet history, and investigate several ideas for incorporating that type of contextual model into a modern neural net architecture. Our approach, if successful, will not only improve the extraction accuracy on noisy microtext, but will also address one of the major problems with neural networks in dynamic domains where new entities emerge frequently. Neural models are trained on a frozen "snapshot" of the world, and the stored information can only be updated through potentially costly retraining or fine tuning. Our work, if successful, will allow systems to be updated much more rapidly and effectively when the world changes.

Phase II

Contract Number: FA8750-22-C-0511
Start Date: 3/14/22    Completed: 3/14/24
Phase II year
2022
Phase II Amount
$750,000
Neural networks have proved highly effective at extracting information from text. However, noisy micro-text text has proved to be particularly difficult because low-level syntactic cues much less useful. In this project, we investigate how information produced by semantic, expectation-based symbolic models can be injected into a neural net architecture to improve extraction accuracy on user-generated text that is short, noisy, and has new, emerging entities. Our work in phase I demonstrated the feasibility of our approach, and we were able to produce an implementation that exceeded the state of the art in experiments extracting and linking entitles in noisy text. In phase II, we have three objectives. First, we intend to improve the architecture further, to produce significant further gains in accuracy based on ideas developed in phase I. Second, we intend to expand the range of knowledge sources that the architecture can take advantage of, and also show that the approach can generalize to other types of extraction tasks, such as relation extraction. Third, we plan to implement an end-to-end system, complete with APIs to enable rapid integration into larger NLP systems and a user interface for training and evaluation. From a practical point of view, our research, if successful, will not only improve extraction accuracy on noisy text, but will also address one of the major problems with neural networks in dynamic domains where new entities emerge frequently. Neural models are trained on a frozen “snapshot” of the world, and the stored information can only be updated through potentially costly retraining or fine tuning. Our work will allow systems to be updated much more rapidly and effectively when the world chang