The Generalization Effect for Multilingual Speech Emotion Recognition across Heterogeneous Languages

2019 
Regularization approaches, such as multi-task learning and dropout, prevent overfitting and improve generalization ability. Speech emotion recognition suffers from insufficiently transcribed databases, where labels are subjectively annotated. Because emotions are a more universally recognized language, the paralinguistic feature space of emotional speech can be better generalized, even across substantially heterogeneous languages. We investigate the effect of regularization and normalization frameworks on two emotional speech databases, the IEMOCAP for English and the JTES for Japanese. We obtain absolute gains of unweighted average recall over ten runs (1.48% for the IEMOCAP and 1.03% for the JTES) and achieve a maximum of 59.49% on the IEMOCAP. From comparative experiments, we confirm that dropout and multi-task learning strategies are effective for multilingual speech emotion recognition, and common normalization over two languages leads to further improvement under all conditions, which suggests that better generalization is available even when two highly heterogeneous languages are merged.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    6
    Citations
    NaN
    KQI
    []