Current search technology relies primarily on searching for keywords and key phrases. Sometimes, minor errors, inadequate query terms, and lack of automated search guidance make it impossible or cumbersome to retrieve relevant scientific and technical answers to questions of importance to DOE. This project will develop advanced Natural Language Processing techniques for textual resources and implement proof-of-concept pilot programs using pragmatic statistical, linguistic and artificial intelligence software. Three complementary methodologies will be investigated: (1) extensible, customizable, language-independent spelling checkers; (2) identification of significant phrases and concepts using linguistic, statistical, and heuristic techniques; and (3) enhancement of search retrieval by using semantic resources, such as thesauri and concept hierarchies, dynamically in the query interface. Phase I will begin by investigating software currently used to aid spelling, an experimental natural language parser (a part-of-speech tagger), and phrase identification modules (phrase parser). The available tools also include a morphological analyzer and a variety of dictionary-building programs. Extensions to the components will facilitate the migration of their functionality into production systems.
Commercial Applications and Other Benefits as described by the awardee: A full-scale system based on Natural Language Processing should help direct research efforts toward solving problems that arise in real systems rather than on theoretical models