SBIR-STTR Award

Hierarchical Dynamic Exploitation of FMV (HiDEF)
Award last edited on: 10/5/2020

Sponsored Program
SBIR
Awarding Agency
DOD : AF
Total Award Amount
$899,923
Award Phase
2
Solicitation Topic Code
AF151-042
Principal Investigator
Kevin Corbey

Company Information

Commonwealth Computer Research Inc (AKA: CCRi)

1422 Sachem Place Unit 1
Charlottesville, VA 22901
   (434) 977-0600
   info@ccri.com
   www.ccri.com
Location: Single
Congr. District: 05
County: Albemarle

Phase I

Contract Number: ----------
Start Date: ----    Completed: ----
Phase I year
2015
Phase I Amount
$149,923
Recent advances in in machine learning have dramatically increased the state of the art in related tasks, such as image recognition and machine translation. Most of this progress has centered around families of neural network algorithms that are broadly called “deep learning.” It is believed that there is a significant opportunity to apply these breakthroughs in image, text, and video processing, that leverage a collection of deep learning techniques to dramatically improve the automated understanding of full motion videos collected from aerial platforms. Furthermore, the representation learned from the raw video data will be sufficiently rich that it will be possible to automatically extract a text description of the content. This generated text content can subsequently be used to provide accurate semantic discovery of video content from analyst-formulated natural language queries or questions, and fusion with existing knowledge bases of information extracted from a text corpus. This will enable important indications and warnings, and dramatically increase the availability of forensic data that can be analyzed to develop predictive algorithms, making it possible to identify future threats sooner.

Benefits:
The approach that we have outlined so far, while targeted at recognizing and describing activities of interest in aerial surveillance videos, is widely applicable to understanding the content of many different varieties of video sources. Within the Department of Defense (DoD) and the Intelligence Community (IC), the need for this capability should only grow, for example as more and better drones become available to units deployed in foreign locations. Drones with cameras are a cheap and effective way to perform surveillance over an area, but only with software tools that can prioritize video content through automated understanding. DoD and IC organizations that we will target include the Marine Corps, small deployable / expeditionary Army units, and the CIA. Additionally, we expect that the law enforcement market for this technology will be significant, for similar reasons. Organizations such as police and the coast guard are only now beginning to experiment with drones and surveillance cameras. While the size of this market depends on the extent to which society accepts this variety of monitoring, we expect there to be a large number of scenarios in which it is deemed acceptable, for example monitoring the United States border (DHS and Border Patrol), the coasts (Coast Guard), and areas surrounding prisons. However, the largest possible market may be in commercial rather than government applications. The private security market is very large and growing. Our proposed solution offers a valuable product that could complement the offerings of existing commercial security companies, who are unlikely to have the advanced technology required to automatically detect activities of interest in their security videos. Rather, they often employ people whose job it is to monitor these videos. Not only is it expensive to pay employees for this task, generally suspicious or otherwise interesting activities will be only be noticed if the person happens to be monitoring the camera at the right time. In addition to use cases for commercial security, we envision uses cases both in the television news industry and the technology industry. In television news, reporters film many activities that include people or events of interest. This information is often used immediately, for an upcoming news report. It is necessary to watch and edit the footage to generate the news report, and our system could streamline this process. Beyond that, our system could be especially useful in analyzing and cataloging the content of video segments so that they can be saved and semantically recovered from a historical repository for future research and programming. In the technology industry, millions of videos are being uploaded and stored by millions of users to sites such as Facebook, YouTube, and Vine. The industry has only scratched the surface of what can be done to enable better organization, automated understanding, and discovery of these videos, for example to improve user experience by automatically tagging their videos. The models that we will develop in this effort will be enabling capabilities for each of these challenges, so offer a large set of business-to-business opportunities.

Keywords:
Full Motion Video, machine learning. neural networks, tactical video exploitation, semantic extraction

Phase II

Contract Number: ----------
Start Date: ----    Completed: ----
Phase II year
2016
Phase II Amount
$750,000
Video collected by aerial platforms can be a significant source of situational awareness. Recent progress in video capture and storage technology makes it possible to scale the gathering and storage of video to greater levels, but the ability to extract semantic meaning has remained a bottleneck because of its dependence on human participation. Previous attempts to automate this review relied on image segmentation approaches that ignored valuable contextual information. VLADE uses deep learning to analyze patterns in training data to enable the generation of descriptions of new videos. For the Phase I system, we built an algorithm to select candidate video segments from longer videos, an indexing scheme for fast look-up, and models for embedding video segments, for generating text descriptions, and for embedding sentences. Phase II will improve model speed and accuracy, incorporate both synthetic and Air Force data, and implement training on a distributed cloud for greater speed and scalability. It will also enhance the model to leverage long-term temporal dependencies in long videos, enable streaming input and user feedback, and integrate all of these components into an end-to-end system that can efficiently ingest and analyze a video and then index it for search.