Empirical evaluation of emotion classification accuracy for non-acted speech

2017 
Emotion recognition is important at the workplace because it impacts a multitude of outcomes, such as performance, engagement and well-being. Emotion recognition from audio is an attractive option due to its non-obtrusive nature and availability of microphones in devices at the workplace. We describe building a classifier that analyzes the para-linguistic features of audio streams to classify them into positive, neutral and negative affect. Since speech at the workplace is different from acted speech, and because it is important that the training data be situated in the right context, we designed and executed an emotion induction procedure to generate a corpus of non-acted speech data of 33 speakers. The corpus was used to train a set of classification models and a comparative analysis of these models was used to choose the feature parameters. Bootstrap aggregation (bagging) was then used on the best combination of algorithm (Random Forest) and features (60 millisecond window size). The resulting classification accuracy of 73% is on par with, or exceeds, accuracies reported in the current literature for non-acted speech for a speaker-dependent test set. For reference, we also report the speaker-dependent recognition accuracy (95%) of the same classifier trained and tested on acted speech for three emotions in the Emo-DB database.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    40
    References
    4
    Citations
    NaN
    KQI
    []