SBIR-STTR Award

An Analysis Process Execution Language and Execution Engine for High Energy Physics
Award last edited on: 1/25/2006

Sponsored Program
SBIR
Awarding Agency
DOE
Total Award Amount
$549,599
Award Phase
2
Solicitation Topic Code
-----

Principal Investigator
Paul Brown

Company Information

FiveSight Technologies Inc

213 North Morgan Street
Chicago, IL 60607
   (312) 432-0556
   info@intalio.com
   www.FiveSight.com
Location: Single
Congr. District: 07
County: Cook

Phase I

Contract Number: ----------
Start Date: ----    Completed: ----
Phase I year
2004
Phase I Amount
$99,599
The particle physics communities, working on the CERN Large Hadron Collider (LHC) Project2, are building infrastructures to support the processing of distributed datasets that require Petascale computing resources (>1015 bytes, and >1015 flops). This project will contribute to this endeavor by developing a formal process language and associated process execution environment targeted specifically at the high energy data analysis processes employed in the Grid computing environment. The approach would incorporate the following high-level features: a “lightweight” algebraic process language, an execution engine, transparent state management and recovery, integration with Grid infrastructure, and resource abstraction. Phase I will: (1) document an executable language for Grid-based high energy data analysis processes, containing a description of the syntax and semantics of the proposed process language; and (2) prototype a Grid-based execution engine for high energy data analysis processes, providing for execution of data analysis processes on a Globus Grid.

Commercial Applications and Other Benefits as described by the awardee:
A system that improves clarity of expression in research areas with large data sets should be of interest in such fields as engineering, biotechnology, pharmaceuticals, and epidemiology

Phase II

Contract Number: ----------
Start Date: ----    Completed: ----
Phase II year
2005
Phase II Amount
$450,000
Current data analysis methodologies in high-energy physics often fall short when managing large scale processing tasks over distributed datasets and used by distributed members within a collaboration or working group. There is no commnon semantic to describe analysis workflow and its attributes for the myriad of complex process types comprising a typical physics study. Without a formal syntax, clarity and composition of methodologies, reproducibility of results, and portability of execution are difficult to achieve over the lifetime of typical high energy physics experiment. This project will develop process oriented programming methods and environrments for the production and analysis of distributed datasets in high energy particle physics. This would result in an "Analysis Process Management" system comprised of a reduction engine for process execution, a toolkit for user composition of processes, and a robust set of client tools for analysis, monitoring, and debugging of running processes. The focus of Phase I was on modeling the workflow and replacing the execution module. A prototype system was developed, which provided scalability, recoverability, and process clarity to the Atlas experiment. Phase II will focus on software-controlled process orchestration, specifically the management and analysis of such processes. A foundation of generic tooling will be created, which could be used by operators in assessing and manipulating destributed datasets.

Commercial Applications and Other Benefits as described by the awardee:
A system that improves clarity of expression in applications with large data sets should be of interest in both the physics and business communities. Specific opportunites for commercialization include supply chain management, epidemiology, and computational biology.