SBIR-STTR Award

Geospatial Database Generation Agents
Award last edited on: 3/25/2009

Sponsored Program
SBIR
Awarding Agency
DOD : Army
Total Award Amount
$846,414
Award Phase
2
Solicitation Topic Code
A07-124
Principal Investigator
Mark H Butler

Company Information

Linguastat Inc

330 Townsend Street Suite 108
San Francisco, CA 94107
   (415) 814-2999
   info@linguastat.com
   www.linguastat.com
Location: Single
Congr. District: 11
County: San Francisco

Phase I

Contract Number: ----------
Start Date: ----    Completed: ----
Phase I year
2008
Phase I Amount
$119,605
We propose to develop technology for geospatial database generation (GDBGEN) by combining six key components: 1) Use of intelligent crawling to identify and efficiently process documents on the World Wide Web relevant to a specific location or geographic feature 2) Spatio-temporal parsing using on new innovations to recognize and resolve location names and time expressions and an automated recognition engine built using conditional random fields 3) Text parsing using natural language processing to parse texts into appropriate lexical, syntactic, and semantic units to identify a wide range of descriptive features that may be expressed in tables or natural language sentences 4) A coordination component to seamlessly relate locations, times, and descriptive features to each other both within and across document boundaries into a collection of semantic “geospatial knowledge structures” 5) A reasoning component which provides a robust broad coverage framework for assigning confidence scores and selecting the best factoids from a set of potentially conflicting candidates 6) A data management layer which facilitates user interaction on multiple levels of granularity and enables discovery, visualization, and export of data using open standards.

Keywords:
Named Entity Extraction. Geoparsing, Data Mining, Agents

Phase II

Contract Number: ----------
Start Date: ----    Completed: ----
Phase II year
2008
Phase II Amount
$726,809
We will develop technology for geospatial database generation agents (GDGA) by combining six key components demonstrated in Phase I: 1) Use of intelligent crawling to identify and efficiently process OSINT sources relevant to a specific location or geographic feature 2) Spatio-temporal parsing using on new innovations to recognize and resolve location names and time expressions and an automated recognition engine built using conditional random fields 3) Text parsing using linguistic processing to parse texts into appropriate lexical, syntactic, and semantic units to identify a wide range of location features that may be expressed in tables or natural language sentences 4) A coordination component to seamlessly relate locations, times, and descriptive features to each other both within and across document boundaries using semantic queries 5) A reasoning component which provides a modular framework for assigning confidence scores and selecting the best factoids from a set of potentially conflicting candidates 6) A data management layer which facilitates user interaction on multiple levels of granularity and enables discovery, visualization, and export of data using open standards. In Phase II we will refine these components, integrate them into a web service and standalone application, and demonstrate a broad class of GDGA applications.

Keywords:
Information Extraction, Toponym Resolution, Web Mining, Intelligent Agents