Phase II Amount
$1,167,973
Intelligence analysts without on-site translators are often relegated to use existing online translation services that limit non-English text translation to small blocks of text with a finite number of characters. Todays translation services do not generate contextual information about entity relationships within the text or provide advanced analytical tools (e.g., sentiment or topic extraction) that increase the understanding of the text. Many of the widely used natural language processing (NLP) tools (e.g., SpaCy, StanfordNLP, FLAIR and UD Pipe) have limited ability to automatically extract entities from non-English text and have difficulty resolving grammatical patterns outside of the subject-predicate-object pattern that is common in foreign languages. Current NLP tools are also limited because of their focus on efficiency over accuracy, use of models trained on small datasets, and support for a limited number of major languages [1-4]. Likewise, existing NLP tools are limited in their ability to identify relationships between entities in text and often rely on the user manually creating a defined set of rules that can be hindered by the complexity of word combinations in large volumes of data and do not readily evolve with changes to the input data. Furthermore, NLP tools do not offer services that confirm/identify the source of the text, correlate the source authors across documents, or readily identify differences in text sources based on language variances (e.g., sarcasm, figures of speech, and jargon). The overall objective is to develop, demonstrate, and deliver a text analytics tool that performs NLP directly on non-English text and allows users not proficient in a target language to gain relevant operational information. The tool will also provide information retrieval of relevant data artifacts and display the results (e.g., entity and event relation arguments) in a dynamic user interface. The following subsections describe N-TISE NLP applications that will assist intelligence analysts with gaining information from non-English text when the analyst does not have linguistic specialization in the target language. The modular dashboards benefit the end user by enabling the visualization of multiple NLP tasks simultaneously. Each dashboard will provide instructions using hover text windows that walk users through each operational task and contains labeled functionality buttons. Furthermore, N-TISE will provide a written User Guide and pre-recorded step-by-step videos that demonstrate the system capabilities for DoD users who have been given limited training on the system.