SBIR-STTR Award

AI/ML Data Management Software System for NSRCs
Award last edited on: 2/25/2024

Sponsored Program
STTR
Awarding Agency
DOE
Total Award Amount
$1,346,472
Award Phase
2
Solicitation Topic Code
C53-15a
Principal Investigator
Maria Chan

Company Information

Visimo LLC

520 East Main Street Suite 200
Carnegie, PA 15106
   (412) 423-8324
   info@visimo.ai
   www.visimo.ai

Research Institution

Argonne National Laboratory

Phase I

Contract Number: DE-SC0022413
Start Date: 2/14/2022    Completed: 2/13/2023
Phase I year
2022
Phase I Amount
$200,000
Scientific research facilities are unable to collaborate and fully utilize microscopy image data due to data silos and a lack of common data management standards. Scientists at research facilities are burdened with laborious, complex processes to manage microscopy image data, reducing the impact of scientific datasets and research. To increase the impact of scientific datasets by delivering (1) an extensible software system and (2) microscopy tools which support findability, accessibility, interoperability, and reuse of data in multi- tool, multi-user scientific research facilities. Working directly with scientific research facilities and scientists, an extensible software system with microscopy tools will be developed, improving the impact and reusability of datasets. The proposed software system is a collaboration platform that builds an eco-system of integrated digital tools to support the scientific community. Accompanying models will fall into the two categories of annotation and search/recall, allowing for the automatic searching and indexing of existing data stored in filesystems, the provisioning of suggested links between data and other resources, automated clustering and metadata suggestion, trend discovery, and streamlined data storage. Initially targeting public sector nanoscience scientific research centers for proof-of-concept and minimum viable product user testing, the proposed software system and machine learning models are applicable to a variety of use cases for other multi-tool, multi-user scientific research facilities including 42 national laboratories and over 250 academic research institutions. The solution will enable users to significantly increase the impact of datasets in their facilities by supporting greater use and collaboration, allowing researchers to focus on advancing scientific innovations. The proposed solution will also reduce the substantial costs associated with manual data management and decrease human error, which is proven to corrupt datasets.

Phase II

Contract Number: DE-SC0022413
Start Date: 4/3/2023    Completed: 4/2/2025
Phase II year
2023
Phase II Amount
$1,146,472
Research labs are unable to effectively collaborate and fully utilize microscopy data due to data silos, insufficient data interoperability, and a lack of common data management standards. To address this problem, this innovation will increase the impact of scientific datasets by delivering (1) an extensible software system and (2) microscopy tools which support findability, accessibility, interoperability, and reuse of data in multi-tool, multi-user scientific research facilities. Challenges within microscopy labs are heightened due to complex and powerful microscopy techniques. While localized efforts have addressed disparate data standards, no large-scale, cloud-enabled collaborative tool exists, and the manual annotation of imagery makes data difficult to index, organize, and reference. Few standard data-sharing tools are available, and data-sharing principles in microscopy are severely lacking. Conventional, time-consuming practices remain prevalent, including sharing research findings solely via publication and only upon request. Some researchers have built their own, home-grown solutions. Metadata is often incomplete and data management suffers, as discovered during Phase I interviews conducted with scientists, researchers, and other potential end-users. Working with scientific research facilities, an architecture of the tool, designed to maximize findability of data, was developed. A cloud-deployable web application to store metadata on microscopy datasets was designed, and wireframes for a user interface were approved. Using a machine learning model, a data ingestion pipeline was implemented to minimize collaboration obstacles between researchers. Phase II will focus on a second pipeline component – an automatic tagging machine learning model. The platform has been designed to implement the following key characteristics in the Phase II effort: automated ingestion, applicability across file types, cloud storage and access, automated metadata tagging, data cataloging, automated attribution, user access control, automated curation, and accessibility to previously “dark” data. Improved collaboration and data availability will save time and money, and will also help to validate research results, enabling the combination of data types and the reuse of hard-to-generate data, accelerating ideas for future research, and benefiting data sharers. Transcribing and anonymizing data may take up to one hour per minute fragment, Data documentation, including adding descriptive metadata, may take four hours per experiment and require 60 metadata fields. The proposed innovation could reduce 50% of the time spent documenting data, increasing researcher efficiency, and saving laboratory money. Additionally, to use microscopy data more efficiently, networks should transfer and store data at a speed of at least 1 gigabit per second and provide centralized storage. This technology will provide both centralized storage and software that performs at that speed.