SBIR-STTR Award

TAU Analytica: Bringing Advanced Data Analytics to Performance Analysis
Award last edited on: 9/5/2019

Sponsored Program
SBIR
Awarding Agency
DOE
Total Award Amount
$1,724,996
Award Phase
2
Solicitation Topic Code
03b
Principal Investigator
Nicholas Chaimov

Company Information

ParaTools Inc

1900 Millrace Drive Suite 104 Mailbox #1
Eugene, OR 97405
   (541) 913-8797
   info@paratools.com
   www.paratools.com
Location: Single
Congr. District: 04
County: Lane

Phase I

Contract Number: DE-SC0019700
Start Date: 2/19/2019    Completed: 2/18/2020
Phase I year
2019
Phase I Amount
$224,996
Powerful tools, including the TAU Performance SystemQR , exist to collect, visualize, and analyze performance data about HPC applications. However, usability issues with traditional HPC programming languages, li- braries, and frameworks are pushing users to newer, higher-level frameworks for specialized purposes, such as deep learning and data analytics. HPC systems, including leadership Department of Energy systems, are increasingly being called upon to support workloads like TensorFlow, Keras, PyTorch, Horovod, and Apache Spark. These relieve the user of worrying about data distribution and communication directly. However, existing performance tools are not well suited to collecting data from them, and single-purpose visualization tools require users to learn how to use them rather than reuse their knowledge of general-purpose visualization tools they already know. ParaTools, Inc. will address this problem by making improvements to the TAU Performance SystemQRto improve the usability and scalability of its data collection capabilities when applied to emerging data analytics and deep learning frameworks. We will provide new visualization and analysis tools to aid users in insightful and actionable information from their performance data. The new tools will be built using data analytics technologies, so that users can analyze performance data of an application written using a data analytics framework using that same framework. Users will then be able to reuse their existing knowledge, rather than having to learn new skills specific to one tool. ParaTools, Inc. will develop TAU Analytica, to be composed of 1) a new, more scalable data format for performance profiles and 2) a new, more scalable performance visualization and analysis system designed to process profiles in the new format. In Phase I, we will first evaluate the feasibility of developing a new profile format, develop a prototype of that format, and integrate the prototype into TAU. The new format will be hierarchical and provide support for parallel readers and writers through a new API to be defined as part of this project. We will evaluate existing hierarchical data formats in the HPC space (such as HDF5) and in data analytics (such as Parquet and the formats supported by Apache Arrow). We will then evaluate the feasibility of developing replacements for TAUs visualization and analysis tools which use the new format and develop prototypes of those tools. The new analysis tools will provide a web-based interface, which will improve remote usage of the tools. The Council on Competitiveness reports that over two-thirds of U.S. industry representatives claim their most demanding HPC applications could utilize a 10x increase in computing capability over the next five years, and over one-third could use a 1000x increase. The affordable performance engineering products developed through this SBIR project will fill a crucial need for improved compute capability utilization by improving software scalability, the most significant limiting factor to achieving a 10-fold improvement in performance.

Phase II

Contract Number: DE-SC0019700
Start Date: 4/6/2020    Completed: 4/5/2022
Phase II year
2020
Phase II Amount
$1,500,000
Powerful tools exist to collect, visualize, and analyze performance data about HPC applications. However, usability issues with traditional HPC programming languages, libraries, and frameworks are pushing users to newer, higher-level frameworks for specialized purposes, such as deep learning and data analytics. HPC systems, including leadership Department of Energy systems, are increasingly being called upon to support such workloads. These relieve the user of worrying about data distribution and communication directly. However, existing performance tools are not well suited to collecting data from them, and single-purpose visualization tools require users to learn how to use them rather than reuse their knowledge of general-purpose visualization tools they already know. This problem will be addressed by making improvements to open-source performance tools to improve the usability and scalability of its data collection capabilities when applied to emerging data analytics and deep learning frameworks. We will provide new performance data collection, visualization and analysis tools to aid users gain insightful and actionable information from their performance data. The new tools will be built using data analytics technologies, so that users can analyze performance data of an application written using a data analytics framework using that same framework. Users will then be able to reuse their existing knowledge, rather than having to learn new skills specific to one tool. In Phase I, a proof-of-concept tool has been developed which collects and enables analysis and visualization of performance data for Data Analytics and Deep Learning applications. The proof-of-concept tool is being used by early customers to analyze the performance of research code. In Phase II, the products developed in Phase I will be hardened into a production-ready, “shrink- wrapped” software distribution which automatically provides insightful performance data about Deep Learning applications. Software images will be provided for rapid deployment in many environments. The product will integrate with Deep Learning runtimes to gather performance data that non-integrated tools could not collect, which will reduce time spent by developers in diagnosing performance issues. The Council on Competitiveness reports that over two-thirds of U.S. industry representatives claim their HPC applications could utilize a 10x increase in computing capability, and over one-third could use a 1000x increase. The affordable performance engineering products developed through this SBIR project will fill a crucial need for improved compute capability utilization by improving software scalability and developer productivity, ultimately accelerating the pace of research and development.