SBIR-STTR Award

Distributed Relational Learning for Cloud Data Fusion
Award last edited on: 11/20/2018

Sponsored Program
SBIR
Awarding Agency
DOD : Navy
Total Award Amount
$1,564,425
Award Phase
2
Solicitation Topic Code
N132-135
Principal Investigator
Nicholas Hamblet

Company Information

Commonwealth Computer Research Inc (AKA: CCRi)

1422 Sachem Place Unit 1
Charlottesville, VA 22901
   (434) 977-0600
   info@ccri.com
   www.ccri.com
Location: Single
Congr. District: 05
County: Albemarle

Phase I

Contract Number: N00014-14-P-1092
Start Date: 10/28/2013    Completed: 8/28/2014
Phase I year
2014
Phase I Amount
$149,946
The US military and intelligence community has been successfully fusing the data it gathers into actionable intelligence. However, the volume of data is increasing such that it cannot be processed on a single server, calling for distributed data fusion algorithms that operate across a cloud. As data grows to the point of requiring distributed storage, machine learning algorithms capable of producing situational awareness must rise to the challenge of working with distributed storage as well. The problem is to design distributed fusion algorithms which not only do as well as single-server solutions, but which leverage larger volumes of data to produce higher quality analytics. This proposal outlines an architecture that works with distributed data sources without needing data to be directly shared between compute nodes. Data fusion without shared memory is a difficult task; however we develop techniques to minimize the amount of information sent between nodes while maintaining high quality fusion. We propose to use models for which both model learning and inference can leverage distributed storage and computation. Inference should be fast and detached model instances readily deployable to local servers for real-time use, while maintaining data and model integrity with the cloud.

Benefit:
In this effort CCRi will develop prototype algorithms to facilitate large scale distributed fusion, including level 1 (entity resolution) and level 2 (inference) fusion. These algorithms will make it feasible to achieve fusion from a large multi-source knowledge store using scalable but accurate machine learning models. Entity resolution and inference will be possible across nodes without requiring that all data is shared over limited bandwidth networks, making it possible to fuse data and infer new information without requiring a full local copy of the data set. This will augment evolving cloud computing systems in the Department of Defense and Intelligence Community with a platform for automated reasoning on large sets of enriched data, a capability that is currently lacking. Moreover, these goals will be accomplish by developing a statistical relational learning algorithm leveraging scalable learning algorithms composed of Restricted Boltzmann Machines, laying the groundwork for distributed deep learning and graph learning services that can be applied to a wide range of problems.

Keywords:
entity disambiguation, entity disambiguation, knowledge inference, probabilistic reasoning, cloud computing, data fusion, statistical relation learning, Machine Learning, Deep Learning

Phase II

Contract Number: N00014-15-C-0109
Start Date: 4/9/2015    Completed: 8/9/2017
Phase II year
2015
Phase II Amount
$1,414,479
CCRi will enhance the utility of large-scale, geographically separated, semantic datasets by developing distributed capabilities for level 1 (entity resolution) and level 2 (inference) data fusion. CCRi will extend the model training server developed in Phase I, as well as the advanced techniques for concept extraction and visualization of large-scale semantic datasets developed during a parallel effort, to support streaming data, Map/Reduce model training, and integration with enriched data providers, as well as simplified model training. Models which incorporate temporal information will enable advanced predictive capabilities for relationships and concepts over time, which CCRi will investigate during Phase II. CCRi's primary focus in Phase II will be on the fusion of multiple models trained independently on distinct data sources, enabling model sharing without full data sharing, and fusion across clouds.

Benefit:
The techniques investigated in Phase II, and the software which implements them, will enable insights obtained from automated analysis on isolated datasets to be combined across datasets. Level 1 (entity resolution) and Level 2 (inference) data fusion on large-scale semantic datasets, combined with advanced concept extraction and visualization, will enable operators to quickly obtain both broad and specific understanding of large relational datasets, as well as the ability to apply advanced predictive capabilities on this data.

Keywords:
Distributed Architecture, cloud computing, inference, Concept Extraction , relational learning, entity resolution, data fusion, Visualization