Speech emotion recognition based on multi-task learning using a convolutional neural network

2017 
In this paper, we propose a speech emotion recognition (SER) method with a multi-task learning-based convolutional neural network (MTL-CNN). It has been recently reported that classifiers using deep neural networks (DNNs) outperformed the hidden Markov model (HMM) and support vector machine (SVM). However, such DNN-based classifiers still have a generalization error problem due to limited training data. To mitigate this problem, the proposed method incorporates multi-task learning (MTL) as transfer learning. In other words, the proposed MTL-based convolutional neural network (MTL-CNN) contains the classification of arousal level, valence level, and gender as three auxiliary tasks. Training the main emotion classification task with three auxiliary tasks helps the MTL-CNN learn useful features and the relationships between tasks. It is demonstrated through SER experiments that an SER system using the proposed MTL-CNN achieves a relative F1-score improvement of 3.64% for a task on a Berlin database of emotional speech compared with using the CNN with a single emotion recognition task.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    8
    Citations
    NaN
    KQI
    []