Using Prosody in ASR: the Segmentation of Broadcast Radio News

2002 
This study explores how prosodic information can be used in Automatic Speech Recognition (ASR). A system was built which automatically identifies topic boundaries in a corpus of broadcast radio news. We evaluate the effectiveness of different types of features, including textual, durational, F0, Tilt and ToBI features in that system. These features were suggested by a review of the literature on how topic structure is indicated by humans and recognised by both humans and machines from both a linguistic and natural language processing standpoint. In particular, we investigate whether acoustic cues to prosodic information can be used directly to indicate topic structure, or whether it is better to derive discourse structure from intonational events, such as ToBI events, in a manner suggested by Steedman’s (2000) theory, among others. It was found that the global F0 properties of an utterance (mean and maximum F0) and textual features (based on Hearst’s (1997) lexical scores and cue phrases) were effective in recognising topic boundaries on their own whereas all other features investigated were not. Performance using Tilt and ToBI features was disappointing, although this could have been because of inaccuracies in estimating these parameters. We suggest that different acoustic cues to prosody are more effective in recognising discourse information at certain levels of discourse structure than others. The identification of higher level structure is informed by the properties of lower level structure. Although the findings of this study were not conclusive on this issue, we propose that prosody in ASR and synthesis should be represented in terms of the intonational events relevant to each level of discourse structure. Further, at the level of topic structure, a taxonomy of events is needed to describe the global F0 properties of each utterance that makes up that structure.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    74
    References
    4
    Citations
    NaN
    KQI
    []