PROSODY DEPENDENT SPEECH RECOGNITION ON RADIO NEWS

K. Chen,Mark Hasegawa-Johnson

PROSODY DEPENDENT SPEECH RECOGNITION ON RADIO NEWS

2003

Does prosody help word recognition? Humans listening to natural prosody, as opposed to monotone or foreign prosody, are able to understand the content with lower cognitive load and higher accuracy [1]. For automatic Large Vocabulary Continuous Speech Recognition (LVCSR), the answer is not that straightforward. Even though successful word recognition and successful prosody recognition have been demonstrated independently in many academic and commercial applications, no result has been reported in the literature that shows improved word recognition on a large-vocabulary continuous speech recognition task with the help of prosody. In 1997, Kompe [2] presented a theoretical proof stating that prosody can never improve word recognition accuracy unless the recognizer uses prosody dependent models. In this paper, we propose a novel probabilistic framework in which word and phoneme are dependent on prosody in a way that improves word recognition. We propose the use of prosody-dependent allophones based on the “hidden mode variable” theory of Ostendorf et al [3], but with prosody dependence carefully restricted to a subset of distributions that are known to be most sensitive to prosodic context. Specifically, we propose to model prosody dependence of the phoneme duration probability density functions (PDFs), the acoustic-prosodic observation PDFs and the language model, and to ignore prosody dependence of the acoustic-phonetic observation PDFs. In so doing, we create effective models of the most striking and most often reported prosody-dependent allophonic variation, without significantly increasing the parameter count of the speech recognizer.

Keywords:

Correction
Cite
Save
Machine Reading By IdeaReader

References

Citations