Setac: Enhancing Usability of Archived Weather Data in the Digital Age
Profile last edited on: 6/11/2022

Total Award Amount
Award Phase
Principal Investigator
Jacob Vosburgh
Activity Indicator

Company Information

IAVO Research and Scientific (AKA:International Association of Virtual Organizations, Incorporated)

4011 University Drive Suite 204
Durham, NC 27701
   (919) 433-2400
Multiple Locations:   
Congressional District:   01
County:   Durham

Phase I

Phase I year
Phase I Amount
NOAA has used historical documents such as ship logs and many other resources to collect weather data critical to modeling global and regional climate and weather conditions. To date, the optical character recognition (OCR) technology developed over the past three decades remains limited in the ability to recognize handwriting and reliably extract text in context. Machine Learning (ML) algorithms can help improve the processes. Given the importance of accuracy for weather data, we propose the development and testing of a custom OCR/text extraction application built using OpenCV and Tesseract. Both are open-source and operate within the open-source Python environment. PyTorch will be evaluated as the deep learning library to optimize the OpenCV and Tesseract integration and post processing. We submit that this integration will provide more flexible pre-processing without undue complexity and understanding, require less post-processing, and establish a framework to add automation to pre/post processing and tuning compared to previous efforts. Our objectives are to: 1. Demonstrate feasibility of OpenCV as an image pre-processing tool for document layout analysis 2. Demonstrate feasibility of Tesseract as text extraction tool 3. Demonstrate feasibility of using PyTorch as adaptive deep learning library for post-processing and information extraction 4. Validate performance

Phase II

Phase II year
Phase II Amount