SBIR-STTR Award

A Cloud-Based Service for Audio Access to News and Blogs
Award last edited on: 9/23/2015

Sponsored Program
SBIR
Awarding Agency
NSF
Total Award Amount
$911,755
Award Phase
2
Solicitation Topic Code
-----

Principal Investigator
Radhika Thekkath

Company Information

AgiVox Inc

440 North Wolfe Road
Sunnyvale, CA 94085
   (650) 996-0224
   listen@agivox.com
   www.agivox.com
Location: Single
Congr. District: 17
County: Santa Clara

Phase I

Contract Number: ----------
Start Date: ----    Completed: ----
Phase I year
2012
Phase I Amount
$149,999
The innovation improves access and discovery of online written content when content is automatically converted to an audio format such as mp3. In general, synthesized audio from random written text often delivers a poor listening experience. The technical effort is motivated by the absence of applications that provide a user preferred news articles and blogs with high quality synthesized audio that is phonetically correct for the visually impaired person or the multitasking visually busy person such as a car driver. This work uses techniques such as textual processing motivated by text understanding and content analysis by domain knowledge and machine learning. Machine learning techniques are used to improve speech synthesis and to incorporate auto-discovery of user preferences into listenable news. Since content scanning by listening is a slower process than visually scanning for relevant responses, this technical work will improve this auditory search process by combining user input with information retrieval for a smoother user experience. The resulting technology infrastructure is expected to provide an array of compelling commercial products with far-reaching implications. The broader/commercial impact of this technical work comes from the cloud-based infrastructure that can process online written text into high-quality audio. This cloud software has advantages of unlimited storage and computing capacity, and uses this to support content retrieval, machine learning, text preprocessing, content discovery, natural language processing, and interaction with commercial Text-to-Speech servers. The first version of this technology will focus on news and blogs, a sufficiently large corpus of information that provides a challenge while also providing considerable commercial interest. The cloud infrastructure can support a range of client-side applications that work on smart-phones, tablets, and desktops. These applications will have access to high quality synthesized audio useable in an "eyes-busy" situation including the low-vision community. The apps will provide customizable access to user-preferred content via intelligent information retrieval. While there is commercial potential in such client applications, the greater value is from licensing the server technology. The societal impact of such a product is tremendous since neither the blind community nor the general public have such easy listening access to the large corpus of online content that is curated with user preferences and with an application control mechanism that is entirely via voice and finger gestures

Phase II

Contract Number: ----------
Start Date: ----    Completed: ----
Phase II year
2014
(last award dollars: 2016)
Phase II Amount
$761,756

This SBIR Phase II project will research the algorithms for automatic discovery of topics of interest for a user based on existing written content such as news and blogs. These topics can then be used to dynamically create a personalized content reader with high quality synthesized audio. Applications created from this cloud-based project will enable a car driver to get instant and relevant information without taking her eyes off the road, a must for today's lengthy commutes and in-car safety. Other applications will allow the same audio access to Internet news and blogs via a smartphone for those who are vision-impaired or vision-busy with exercising, gardening, etc. This project has a societal impact among the blind, aging eyes that have difficulty reading small print and small screens, and the car driver, since it provides the ability to find contextual news and blogs in an "eyes-free" manner and with an easy listening experience. The fundamental research components from this project can be re-applied to other content and similar fields of research. This project's applications have the potential to generate a revenue stream which will in turn create jobs and have an overall impact on the economy.The goal of the research is to determine whether topic information extracted from a large corpus of unrelated documents using unsupervised machine-learning can be used for content discovery, improving the quality of synthesized speech, and discovering user preferences for a recommendation system. The research uses topic-modeling, a machine learning algorithm, to uncover topics across thousands of RSS feeds, and natural language processing to improve the quality of synthesized speech. Retrieval of specific RSS channels per user's topic preferences is then possible by using the probability of mappings between topics and RSS channels. Since content scanning by listening is a slower process than visually scanning for relevant responses, the current research proposal will improve this process by combining user preference with information retrieval for a better user experience. The topic discovery research will include three key components: discovering multiple levels of subtopics to create topic hierarchies for easier browsing, a method for identifying trending topics, and determining current and relevant topics for automation of audio content. Once this project is shown to produce effective results, the same techniques can be applied across other document collections.