This project aims at building a modular component that can be part of a future end-to-end system with capability of distilling, analyzing, discovering, structuring, and interpreting relevant information hidden in massive data that is already stored in distributed multi-INT databases including but not limited to social network data, ISR sensor data, and Internet traffic data. Our approach is based on topological data analysis, a method that focuses on recovering the topology of noisy data points, sampled from an unknown space, and embedded in a high-dimensional space. Based on our preliminary analysis in Phase I, we will continue our efforts for constructing combinatorial representations of point sets, as well as developing algorithms for effective computation of robust topological invariants. Emphasis will be on supporting decentralized and parallel processing such that the modular system can be deployed on clusters of computers. Then cluster-based computing approaches such MapReduce in a Hadoop-based ecosystem can be used to wrap the algorithms and implement using open-source software.
Keywords: Massive Data, Parallel Algorithms, Topological Data Analysis, Social Networks, Anomaly Detection, Cloud Computing