Integration of Face and Voice During Emotion Perception: Is There Anything Gained for the Perceptual System Beyond Stimulus Modality Redundancy?

2013 
In this chapter, we review empirical data and theoretical models which have been put forward in the affective science literature to account for the perception of emotions, when this process is simultaneously accomplished by sight and hearing. The visual component is provided by the face configuration that undergoes some geometric changes, which in turn lead to different and discrete emotion facial expressions. The auditory component is provided by the voice and its changes in pitch, duration, and/or intensity leading to different affective tones of voice. Face–voice integration during emotion perception occurs when affective information conveyed by the two sensory modalities is integrated into a unified percept, or multisensory object. Although one may assume that the rapid and mandatory combination of multiple or complementary affective cues is adaptive (i.e., it likely reduces the effects of adverse factors like drifts or intrinsic noise), the central nervous system must however show some selectivity regarding which inputs from separate senses may eventually combine, as compared with merely redundant emotion signals. Indeed, not all spatial or temporal coincidences or co-occurrences lead to the perception of unified objects. Interestingly, results of behavioral studies confirm this conjecture, and indicate that the combination of emotional facial expressions with affective prosody leads to the creation of genuinely multisensory emotional objects, which show different properties compared to the combination of an emotional facial expression with another redundant or distracting emotional facial expression, or an emotion written word. Hence, the findings and models reviewed in this chapter suggest that some selectivity can be found in the way visual and auditory information is actually combined during emotion perception. The rapid and automatic pairing of an emotional face with an affective voice might present a naturalistic situation in the sense that there is no need for mediation by higher-level cognitive, attentional or linguistic processes, which may be necessary for the efficient decoding of other stimulus categories or multisensory objects.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    115
    References
    4
    Citations
    NaN
    KQI
    []