Audio-Visual Emotion Recognition System for Variable Length Spatio-Temporal Samples Using Deep Transfer-Learning

2020 
Automatic Emotion recognition is renowned for being a difficult task, even for human intelligence. Due to the importance of having enough data in classification problems, we introduce a framework developed with the purpose of generating labeled audio to create our own database. In this paper we present a new model for audio-video emotion recognition using Transfer Learning (TL). The idea is to combine a pre-trained high level feature extractor Convolutional Neural Network (CNN) and a Bidirectional Recurrent Neural Network (BRNN) model to address the issue of variable sequence length inputs. Throughout the design process we discuss the main problems related to the high complexity of the task due to its inherent subjective nature and, on the other hand, the important results obtained by testing the model on different databases, outperforming the state-of-the-art algorithms in the SAVEE [3] database. Furthermore, we use the mentioned application to perform precision classification (per user) into low resources real scenarios with promising results.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    1
    Citations
    NaN
    KQI
    []