Bi-modality Fusion for Emotion Recognition in the Wild

Sunan Li,Wenming Zheng,Yuan Zong,Cheng Lu,Chuangao Tang,Xingxun Jiang,Jiateng Liu,Wanchuang Xia

Bi-modality Fusion for Emotion Recognition in the Wild

2019

The emotion recognition in the wild has been a hot research topic in the field of affective computing. Though some progresses have been achieved, the emotion recognition in the wild is still an unsolved problem due to the challenge of head movement, face deformation, illumination variation etc. To deal with these unconstrained challenges, we propose a bi-modality fusion method for video based emotion recognition in the wild. The proposed framework takes advantages of the visual information from facial expression sequences and the speech information from audio. The state-of-the-art CNN based object recognition models are employed to facilitate the facial expression recognition performance. A bi-direction long short term Memory (Bi-LSTM) is employed to capture dynamic information of the learned features. Additionally, to take full advantages of the facial expression information, the VGG16 network is trained on AffectNet dataset to learn a specialized facial expression recognition model. On the other hand, the audio based features, like low level descriptor (LLD) and deep features obtained by spectrogram image, are also developed to improve the emotion recognition performance. The best experimental result shows that the overall accuracy of our algorithm on the Test dataset of the EmotiW challenge is 62.78, which outperforms the best result of EmotiW2018 and ranks 2nd at the EmotiW2019 challenge.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations