SBIR-STTR Award

Setac: Enhancing Usability of Archived Weather Data in the Digital Age
Profile last edited on: 6/11/2022

Program
SBIR
Agency
DOC | NOAA
Total Award Amount
$149,882
Award Phase
1
Principal Investigator
Jacob Vosburgh
Activity Indicator

Company Information

IAVO Research and Scientific (AKA:International Association of Virtual Organizations, Incorporated)

4011 University Drive Suite 204
Durham, NC 27701
   (919) 433-2400
   mreeves@iavo-rs.com
   www.iavo-rs.com
Multiple Locations:   
Congressional District:   01
County:   Durham

Phase I

Phase I year
2021
Phase I Amount
$149,882
NOAA has used historical documents such as ship logs and many other resources to collect weather data critical to modeling global and regional climate and weather conditions. To date, the optical character recognition (OCR) technology developed over the past three decades remains limited in the ability to recognize handwriting and reliably extract text in context. Machine Learning (ML) algorithms can help improve the processes. Given the importance of accuracy for weather data, we propose the development and testing of a custom OCR/text extraction application built using OpenCV and Tesseract. Both are open-source and operate within the open-source Python environment. PyTorch will be evaluated as the deep learning library to optimize the OpenCV and Tesseract integration and post processing. We submit that this integration will provide more flexible pre-processing without undue complexity and understanding, require less post-processing, and establish a framework to add automation to pre/post processing and tuning compared to previous efforts. Our objectives are to: 1. Demonstrate feasibility of OpenCV as an image pre-processing tool for document layout analysis 2. Demonstrate feasibility of Tesseract as text extraction tool 3. Demonstrate feasibility of using PyTorch as adaptive deep learning library for post-processing and information extraction 4. Validate performance

Phase II

Phase II year
---
Phase II Amount
---