Domain Adaptive Representation Learning for Facial Action Unit Recognition

2019 
Abstract Learning robust representations for applications with multiple modalities of input can have a significant impact on improving performance. Traditional representation learning methods rely on projecting the input modalities to a common subspace to maximize agreement amongst the modalities for a particular task. We propose a novel approach to representation learning that uses a latent representation decoder to reconstruct the target modality and thereby employ the target modality purely as a supervision signal for discovering correlations between the modalities. Through cross modality supervision, we demonstrate that the learnt representation is able to improve upon the performance of the task of facial action unit (AU) recognition over modality specific representations and even their fused counterparts. As an extension, we explore a new transfer learning technique to adapt the learnt representation to the target domain. We also present a shared representation based feature fusion methodology to improve the performance of any multi-modal system. Our experiments on three AU recognition datasets - MMSE, BP4D and DISFA, show strong performance gains producing state-of-the-art results in spite of the absence of data from a modality.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    43
    References
    6
    Citations
    NaN
    KQI
    []