SBIR-STTR Award

Topological Robust Algorithms for Massive Data Sets via Agent-based Modular Infrastructure (TA-DA) Supporting Decentralized and Parallel Processing
Award last edited on: 2/1/2013

Sponsored Program
SBIR
Awarding Agency
DOD : OSD
Total Award Amount
$839,929
Award Phase
2
Solicitation Topic Code
OSD10-L07
Principal Investigator
Onur Savas

Company Information

Intelligent Automation Inc (AKA: IAI)

15400 Calhoun Drive Suite 190
Rockville, MD 20855
   (301) 294-5200
   contact@i-a-i.com
   www.i-a-i.com
Location: Single
Congr. District: 06
County: Montgomery

Phase I

Contract Number: ----------
Start Date: ----    Completed: ----
Phase I year
2011
Phase I Amount
$100,000
We propose to develop topological robust dimensionality reduction algorithms for massive data sets. The algorithm will be wrapped in an agent-based modular infrastructure and will support decentralized and parallel data processing. Our algorithms are based on Mapper, a method that discovers the data topology by constructing lower-dimensional simplicial complexes. Our goal is to reduce the role of the mathematician in selecting appropriate topological and geometric parameters of Mapper, and automate the process in a parallel fashion. The parallelization is already inherent to Mapper, due to its partial clustering. We will improve the parallelization by borrowing ideas from streaming algorithms and construction of local wise alpha-complexes. Robustness of the data discovery process to noisy and missing data is built on persistent homology, where only persistent structures remain and others collapse. Automatic assessment of the algorithm will be handled by comparing multiresolutional outputs, for example, real vs. artifact features. Querying of data will be handled by topological matching in the constructed simplicial complexes. The algorithmic infrastructure will be wrapped by a family of software agents using Intelligent Automation’s agent infrastructure Cybele. As part of the infrastructure development, APIs will be developed to assist in the modularization.

Keywords:
Data Discovery, Topology Recovery, Dimensionality Reduction, Massive Data Sets, Simplical Complex, Parallel Algorithms, Agent-Based Infrastructure, Persistent Homology

Phase II

Contract Number: ----------
Start Date: ----    Completed: ----
Phase II year
2012
Phase II Amount
$739,929
This project aims at building a modular component that can be part of a future end-to-end system with capability of distilling, analyzing, discovering, structuring, and interpreting relevant information hidden in massive data that is already stored in distributed multi-INT databases including but not limited to social network data, ISR sensor data, and Internet traffic data. Our approach is based on topological data analysis, a method that focuses on recovering the topology of noisy data points, sampled from an unknown space, and embedded in a high-dimensional space. Based on our preliminary analysis in Phase I, we will continue our efforts for constructing combinatorial representations of point sets, as well as developing algorithms for effective computation of robust topological invariants. Emphasis will be on supporting decentralized and parallel processing such that the modular system can be deployed on clusters of computers. Then cluster-based computing approaches such MapReduce in a Hadoop-based ecosystem can be used to wrap the algorithms and implement using open-source software.

Keywords:
Massive Data, Parallel Algorithms, Topological Data Analysis, Social Networks, Anomaly Detection, Cloud Computing