SBIR-STTR Award

Automated Audio Clustering
Award last edited on: 11/1/2018

Sponsored Program
SBIR
Awarding Agency
DOD : Navy
Total Award Amount
$900,259
Award Phase
2
Solicitation Topic Code
N112-163
Principal Investigator
John Kominek

Company Information

Voci Technologies Incorporated (AKA: Silicon Vox Corporation)

6301 Forbes Avenue Suite 120
Pittsburgh, PA 15217
   (412) 621-9310
   info@vocitec.com
   www.vocitec.com
Location: Single
Congr. District: 18
County: Allegheny

Phase I

Contract Number: N00014-12-M-0034
Start Date: 10/11/2011    Completed: 8/10/2012
Phase I year
2012
Phase I Amount
$150,000
Voci Technologies Incorporated (Voci) is the leading small business developing accelerated Human Language Technology based solutions. Voci is partnering with Richard M. Stern, a Voci advisor and Professor at Carnegie Mellon University (CMU), to develop an Automated Speaker Clustering System (ASCS). The proposed ASCS will be developed by integrating Vocis best in class, patent pending, HyperVox technology with the latest SID capabilities from CMU. The proposed system is uniquely architected to provide tuning parameters that enable tradeoffs between false positive and false negative rates, and the ability to simulate the impact of improvements on different components of the ASCS system essential to developing reliable performance specifications. The proposed ASCS uses a parallel set of proprietary techniques to optimize the extraction of voice features in both batch and streaming modes. The resulting voice features are fused with a reliable word list to provide a clustering decision together with a confidence estimate on the match between the audio sample and the nearest speaker cluster. At the end of Phase I the team will demonstrate the automated clustering of audio files. The Team believes its final ASCS implementation will be able to automatically cluster 10s to 100s of thousand of audio files per hour with useful true/false positive rates.

Benefit:
The intent of this effort is to produce a dual use 0x9D capability that meets the needs of the US Navy, DoD and commercial applications. A critical consideration in any commercial product is that it is open and easily integratable with other systems. Voci envisions this powerful new Automatic Speaker Clustering System (ASCS) technology to be embedded in existing Voci products, enhancing these systems ability (i) to provide an additional security layer, without requiring the cost of integrating into an existing interactive voice response (IVR) system, (ii) to more effectively identify individuals of interest for the purpose of preventing fraud and other crimes, and (iii) to render customer relationship management (CRM) systems more effective in dealing with individual customers. These application spaces can be considered part of the enterprise analytics space, which was estimated to be over $10B in revenue in 2010.

Keywords:
(4) Gender Identification, (4) Gender Identification, (6) Word Spotting, (5) Language Identification, (8) Large Vocabulary Continuous Speech Recognition (LVCSR), (2) Human Language Technology, (7) Confidence Score, (1) Automated Audio clustering, (3) Speaker Identification,

Phase II

Contract Number: N00014-13-C-0091
Start Date: 1/8/2013    Completed: 8/23/2014
Phase II year
2013
(last award dollars: 2016)
Phase II Amount
$750,259

Voci Technologies Incorporated (Voci) is the leading small business developing accelerated Human Language Technology based solutions. In Phase I of this SBIR, Voci demonstrated the feasibility of automatically clustering audio with useful false-positive and false-negative rates. In Phase II, Voci is is partnering with Vickers & Nolan Enterprises to develop a prototype Automated Speaker Clustering System (ASCS) and accelerate the technology transfer process. The prototype ASCS will be extended beyond the experimental ASCS in several important ways. The prototype ASCS will incorporate diarization to support the automatic clustering of unsegmented multi-speaker audio. The prototype ASCS will incorporate feature and model robustness that will extend the systems application beyond telephonic audio to other types of recordings of importance to the Navy. The prototype will have improved usability it will run faster, support the clustering of a larger number of speakers, and support the clustering of audio cuts of shorter length all with better accuracy. And finally, the prototype ASCS will be architected to be maintainable and extensible so that it can be evaluated under realistic deployment conditions and matured within the Phase II Option.

Benefit:
The intent of this effort is to produce a dual use 0x9D capability that meets the needs of the US Navy, DoD and commercial applications. A critical consideration in any commercial product is that it is open and easily integratable with other systems. Voci envisions this powerful new Automatic Speaker Clustering System (ASCS) technology to be embedded in existing Voci products, enhancing these systems ability (i) to provide an additional security layer, without requiring the cost of integrating into an existing interactive voice response (IVR) system, (ii) to more effectively identify individuals of interest for the purpose of preventing fraud and other crimes, and (iii) to render customer relationship management (CRM) systems more effective in dealing with individual customers. These application spaces can be considered part of the enterprise analytics space, which was estimated to be over $10B in revenue in 2010.

Keywords:
(8) Large Vocabulary Continuous Speech Recognition (LVCSR), (3) Speaker Identification, (6) Word Spotting, (4) Gender Identification, (7) Confidence Score, (1) Automated Audio clustering, (5) Language Identification, (2) Human Language Technology