Comparing the Performance of Facial Emotion Recognition Systems on Real-Life Videos: Gender, Ethnicity and Age

2021 
Dealing with non-verbal communications will be a key breakthrough for future technologies as much of the effort of the 21st century technologies has been in dealing with numbers and verbal communications. The automatic recognition of facial expressions is of theoretical and commercial interests and to this end there must exist video databases that incorporate the idiosyncrasies of human existence – ethnicity, gender and age. We compare the performance of three major emotion recognition software systems on real life videos of politicians from across the world. Our sample of 45 videos (total length of 2 h 26 min, with 219150 frames) is composed of male and female politicians ranging in age from 40 to 78 with well-defined differences related to gender and nationality/ethnicity. Our sample of images are partially posed and partially spontaneous – the demeanour of politicians when they engage in speech making. Our target systems, Micorosoft Azure Cognitive Services Face API, Affectiva AFFDEX and Emotient FACET, have been trained on posed expressions usually, with limited testing on spontaneous images, so in effect we are operating at the edge of the performance of these systems. There are similarities in the performance of these systems on some emotions, especially joy, but there are differences in emotion recognition, such as anger. There are also gender differences as well as differences based on age and race. This is an important issue as more and more video data is becoming available and video analytics that can deal with aspects of cognition, like emotion, accurately and across cultural/gender/ethnic divides will be a major component of future technologies.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    55
    References
    0
    Citations
    NaN
    KQI
    []