SBIR-STTR Award

Variable Speed Speech Synthesis
Award last edited on: 11/14/2018

Sponsored Program
SBIR
Awarding Agency
DOD : Navy
Total Award Amount
$895,598
Award Phase
2
Solicitation Topic Code
N08-149
Principal Investigator
Minkyu Lee

Company Information

Advanced Media Research Inc (AKA: AMR)

422 Executive Drive
Princeton, NJ 08540
   (609) 430-0900
   info@amrnd.com
   www.amrnd.com
Location: Single
Congr. District: 12
County: Mercer

Phase I

Contract Number: N68335-08-C-0429
Start Date: 8/12/2008    Completed: 1/20/2010
Phase I year
2008
Phase I Amount
$145,798
The objective of this proposal is to demonstrate the feasibility of developing variable speed speech synthesis technology. We plan to use open source TTS systems because they often provide flexibility and interoperability, which is essential for research oriented work. To modify speaking speed, we plan to focus on time domain time-scale modification algorithms, which provide good quality with less computational complexity compared to other approaches such as sinusoidal models or vocoder-based approaches. We will test time domain methods including SOLA, PSOLA, and WSOLA. We will apply linear scaling factor, which modifies the duration regardless of whether the speech segment is a silence, a transient or a sustained vowel. We will also apply different scaling factors to different parts of speech segments. During the optional six months, we will focus on creating multiple voices by modifying voice types, gender, dialects (accents), and perceived emotion of the speech. Based on the source-filter models, we will investigate algorithms for modifying source and filter characteristics, from which many different voices can be generated.

Benefit:
The time-scale modification technology will be of tremendous commercial value. Transforming speech or audio signal to an alternative time-scale can be useful digital audio effect. It can be used for fast browsing of speech material for digital libraries and distance learning, fast/slow playback for telephone answering machines and dictaphones, accelerated aural reading for the blind, editing audio/visual recordings for allocated timeslots within the radio/television industry. The ability to change the voice characteristics of TTS speech will enable new applications in various fields in addition to generating multiple voices. It will be an innovative technology for businesses in virtual world environments, childrens toy industry, web-based application software industry, on-line gaming industry, on-line service and entertainment industry, movie industry, and animation (cartoon) industry.

Keywords:
Text-to-Speech, Text-to-Speech, Speaking Speed, Voice Conversion, Variable speed, Multiple voices

Phase II

Contract Number: N61339-10-C-0037
Start Date: 9/28/2010    Completed: 9/28/2012
Phase II year
2010
Phase II Amount
$749,800
The main objective of this project is to develop technologies for variable speaking speed synthesis, or Text-to-Speech (TTS). A TTS system converts written text into spoken language. There are numerous commercial TTS systems; however, most systems do not allow for speed control of the output speech. In the simulation-based virtual training system, TTS systems are used to generate the voice of the virtual role-players. The ability to control the speaking speed of TTS output without sacrificing intelligibility will support a range of fast operational and training scenarios for Aviation Training Systems. Phase I was focused on the feasibility study of developing a variable speed speech synthesis technology. This proposal is a continuous effort toward Phase II, where the main goal is to develop a working prototype to enable realistic and adjustable speed control of synthetic speech that is intelligible enough to support a range of fast pace operational and training scenarios. An additional goal of Phase II is to provide the capability of mimicking the “radio voice” of military personnel, as well as, the capability of generating multiple voices from a single TTS system.

Keywords:
Voice Conversion, Voice Conversion, Voice Transformation, Speech Synthesis, Variable Speed, Time Scale Modification, Text-To-Speech