Language Model Adaptation for Emotional Speech Recognition using Tweet data

2020 
Generally, emotional speech recognition is consid-ered more difficult than non-emotional speech recognition. This is because the acoustic features of emotional speech are different from those of non-emotional speech, and these features vary greatly depending on the emotion type and intensity. In addition, it is difficult to recognize colloquial expressions included in emotional utterances using a language model trained on a corpus such as lecture speech. We have been studying emotional speech recognition for an emotional speech corpus, Japanese Twitterbased emotional speech (JTES). This corpus consists of tweets on Twitter with an emotional label assigned to each sentence. In this study, we aim to improve the performance of emotional speech recognition for the JTES through language model adaptation, which will require a text corpus containing emotional expressions and colloquial expressions. However, there is no such largescale Japanese corpus. To solve this problem, we propose a language model adaptation using tweet data. Expectedly, tweet data contains many emotional and colloquial expressions. The sentences used for adaptation were extracted from the collected tweet data based on some rules. Following filtering based on these specified rules, a large amount of tweet data of 25. 86M words could be obtained. In the recognition experiments, the baseline word error rate was 36.11%, whereas that of the language model adaptation was 25.68%. In addition, that of the combined use of the acoustic model adaptation and language model adaptation was 17.77%. These results established the effectiveness of the proposed method.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    0
    Citations
    NaN
    KQI
    []