Phase II Amount
$1,146,472
Research labs are unable to effectively collaborate and fully utilize microscopy data due to data silos, insufficient data interoperability, and a lack of common data management standards. To address this problem, this innovation will increase the impact of scientific datasets by delivering (1) an extensible software system and (2) microscopy tools which support findability, accessibility, interoperability, and reuse of data in multi-tool, multi-user scientific research facilities. Challenges within microscopy labs are heightened due to complex and powerful microscopy techniques. While localized efforts have addressed disparate data standards, no large-scale, cloud-enabled collaborative tool exists, and the manual annotation of imagery makes data difficult to index, organize, and reference. Few standard data-sharing tools are available, and data-sharing principles in microscopy are severely lacking. Conventional, time-consuming practices remain prevalent, including sharing research findings solely via publication and only upon request. Some researchers have built their own, home-grown solutions. Metadata is often incomplete and data management suffers, as discovered during Phase I interviews conducted with scientists, researchers, and other potential end-users. Working with scientific research facilities, an architecture of the tool, designed to maximize findability of data, was developed. A cloud-deployable web application to store metadata on microscopy datasets was designed, and wireframes for a user interface were approved. Using a machine learning model, a data ingestion pipeline was implemented to minimize collaboration obstacles between researchers. Phase II will focus on a second pipeline component an automatic tagging machine learning model. The platform has been designed to implement the following key characteristics in the Phase II effort: automated ingestion, applicability across file types, cloud storage and access, automated metadata tagging, data cataloging, automated attribution, user access control, automated curation, and accessibility to previously dark data. Improved collaboration and data availability will save time and money, and will also help to validate research results, enabling the combination of data types and the reuse of hard-to-generate data, accelerating ideas for future research, and benefiting data sharers. Transcribing and anonymizing data may take up to one hour per minute fragment, Data documentation, including adding descriptive metadata, may take four hours per experiment and require 60 metadata fields. The proposed innovation could reduce 50% of the time spent documenting data, increasing researcher efficiency, and saving laboratory money. Additionally, to use microscopy data more efficiently, networks should transfer and store data at a speed of at least 1 gigabit per second and provide centralized storage. This technology will provide both centralized storage and software that performs at that speed.