Cross-culture Continuous Emotion Recognition with Multimodal Features

2019 
Automatic emotion recognition is a challenging task that can make great impact on improving natural human-computer interactions. In this paper, we present our automatic prediction of dimensional emotional state for Cross-cultural Emotion Sub-Challenge (AVEC 2018), which uses multi-features and fusion across visual, audio and text modalities. Single-feature predictions are modeled at first with support vector regression (SVR). The multimodal fusion of these modalities is then performed with a multiple linear regression model. Besides the baseline features, we extract one-gram and two-gram features from text, and some types of convolutional neural networks (CNNs) feature from video. Our multimodal fusion reached CCC=0.599 on the development set for arousal, 0.617 for valence and 0.289 for likability.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    0
    Citations
    NaN
    KQI
    []