Enabling Robots to Distinguish Between Aggressive and Joking Attitudes

2021 
During a conversation, the meaning of an utterance may drastically change depending on the attitude of the speaker. For example, offensive words may be used “seriously” to threaten or “jokingly” to tease. However, robots do not have yet the capacity to understand such nuance. Therefore, we have developed an attitude recognition system that allows robots to evaluate whether an utterance with an offensive lexical content is aggressive (serious) or a joke. First, we created a data set of 7199 utterances (16 participants) that reproduces the different attitudes toward robots that we observed in field experiments. Second, we implemented voice quality features for breathy voice, creaky voice (or vocal fry), and pressed voice analysis, combined them with conventional prosodic features, and developed a neural network architecture to estimate the “perceived level of joking” of the utterances. Finally, we compared the performance of the proposed method to standard approaches for speech emotion recognition. We show that the combination of voice quality and prosodic features we proposed outperforms at this task the conventional neural network used for speech emotion recognition. The proposed system predicts the “perceived level of joking” of an utterance with an accuracy comparable to what a human would guess.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    0
    Citations
    NaN
    KQI
    []