Two-Level Attention with Multi-task Learning for Facial Emotion Estimation

2019 
Valence-Arousal model can represent complex human emotions, including slight changes of emotion. Most prior works of facial emotion estimation only considered laboratory data and used video, speech or other multi-modal features. The effect of these methods applied on static images in the real world is unknown. In this paper, a two-level attention with multi-task learning (MTL) framework is proposed for facial emotion estimation on static images. The features of corresponding region were automatically extracted and enhanced by first-level attention mechanism. And then we designed a practical structure to process the features extracted by first-level attention. In the following, we utilized Bi-directional Recurrent Neural Network (Bi-RNN) with self-attention (second-level attention) to make full use of the relationship of these features adaptively. It can be concluded as a combination of global and local information. In addition, we exploited MTL to estimate the value of valence and arousal simultaneously, which employed the correlation of the two tasks. The quantitative results conducted on AffectNet dataset demonstrated the superiority of the proposed framework. In addition, extensive experiments were carried out to analysis effectiveness of different components.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    3
    Citations
    NaN
    KQI
    []