Most of todays cross domain solutions allow only well structured text to be automatically parsed and shared at a different security levels. Unstructured text must be released by a human. While dirty words searches and classification markings are helpful, the human still needs to spend an inordinate amount of time reading the document. Often documents contain very similar text and there is no reliable method for comparing this document to what has been previously released/denied. The user must critically re-evaluate the entire document before releasing. The focus of this effort is reduce the amount of time the human has to spend on a document and increase the level of assurance (accuracy) in sharing information across security domains. Our effort will focus on developing a complete approach for reducing the humans workload by: using NLP techniques to parse and tag a document, summarize (distill) the content of the unstructured data, categorize the document, use word/phrase checking and machine learning techniques to compare to previously released documents. Provide a phase I, initial prototype, that assists the human in identifying areas of the document s/he should focus on. A robust and complete prototype will be implemented in phase II.
Keywords: Cds, Cross-Domain, Multi-Level Security, Natural Language Processing, Similarity Matching, Machine Learning