Phase II year
2014
(last award dollars: 2016)
This SBIR Phase II project will research the algorithms for automatic discovery of topics of interest for a user based on existing written content such as news and blogs. These topics can then be used to dynamically create a personalized content reader with high quality synthesized audio. Applications created from this cloud-based project will enable a car driver to get instant and relevant information without taking her eyes off the road, a must for today's lengthy commutes and in-car safety. Other applications will allow the same audio access to Internet news and blogs via a smartphone for those who are vision-impaired or vision-busy with exercising, gardening, etc. This project has a societal impact among the blind, aging eyes that have difficulty reading small print and small screens, and the car driver, since it provides the ability to find contextual news and blogs in an "eyes-free" manner and with an easy listening experience. The fundamental research components from this project can be re-applied to other content and similar fields of research. This project's applications have the potential to generate a revenue stream which will in turn create jobs and have an overall impact on the economy.The goal of the research is to determine whether topic information extracted from a large corpus of unrelated documents using unsupervised machine-learning can be used for content discovery, improving the quality of synthesized speech, and discovering user preferences for a recommendation system. The research uses topic-modeling, a machine learning algorithm, to uncover topics across thousands of RSS feeds, and natural language processing to improve the quality of synthesized speech. Retrieval of specific RSS channels per user's topic preferences is then possible by using the probability of mappings between topics and RSS channels. Since content scanning by listening is a slower process than visually scanning for relevant responses, the current research proposal will improve this process by combining user preference with information retrieval for a better user experience. The topic discovery research will include three key components: discovering multiple levels of subtopics to create topic hierarchies for easier browsing, a method for identifying trending topics, and determining current and relevant topics for automation of audio content. Once this project is shown to produce effective results, the same techniques can be applied across other document collections.