Audio-visual emotion recognition using deep transfer learning and multiple temporal models

2017 
This paper presents the techniques used in our contribution to Emotion Recognition in the Wild 2017 video based sub-challenge. The purpose of the sub-challenge is to classify the six basic emotions (angry, sad, happy, surprise, fear and disgust) and neutral. Our proposed solution utilizes three state-of-the-arts techniques to overcome the challenges for the wild emotion recognition. Deep network transfer learning is used for feature extraction. Spatial-temporal model fusion is to make full use of the complementary of different networks. Semi-auto reinforcement learning is for the optimization of fusion strategy based on dynamic outside feedbacks given by challenge organizers. The overall accuracy of the proposed approach on the challenge test dataset is 57.2%, which is better than the challenge baseline of 40.47% .
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    30
    References
    61
    Citations
    NaN
    KQI
    []