Phase II year
2011
(last award dollars: 2021)
Phase II Amount
$2,996,199
Harmonia proposes to continue development from Phase I of a Large Data Handling Architecture (LDHA). Given sufficient amounts of commodity hardware, its goal is to scale up to ingest a terabyte or more of data per hour from each open source or sensor; store tens of thousands and more of terabyte files; and support operations on databases that use complex structures as table cells, sparse tables, and billions or more rows and a million or more columns. Our implementation uses the Ubuntu server Linux distribution, Hadoop core, HBase database and Chukwa data collector. We implement ingesters as Java Servlets to collect continuously and automatically pass for insertion into Hadoop a variety of open sources including, but not limited to structured and unstructured text, sensor data, images, and streaming video, experimenting with real continuous feeds available on the Internet. We develop end user programming tools that empower analysts to define analysis tasks for distributed execution even though they do not have a programming background in MapReduce. Our architecture is service oriented, high performance, extensible, and permits integration with the DoD Global Information Grid. We support disadvantaged users through services that allow downloading of data. We do extensive performance assessment of bottlenecks.
Keywords: Large-Scale Data, Sensor Networks, Net-Centric, Gig, Data Sharing, Meta-Data, Ontology, Hadoop