Processing speaker affect during spoken sentence comprehension

2013 
Anne van Leeuwen Utrecht institute of Linguistics OTS, Utrecht University Processing speaker affect during spoken sentence comprehension We often smile (and frown) while we talk. Speakers use facial expression, posture and prosody to provide additional cues that signal speaker stance. Speaker stance refers to the affective attitude of the speaker to the information he (or she) is providing, i.e., how the speaker feels about the situation or event he describes. For example, it greatly matters if “we’re having a baby” is said with a smile or with a frown. In order to respond in an adequate way, listeners must therefore somehow weave both what is said and how it is said into a single coherent interpretation. In the current study we use event-related potentials (ERPs) to investigate when and how listeners integrate phonetic cues to speaker stance with the unfolding sentence meaning, and the associated unfolding situation model. Do listeners pick up on these subtle phonetic cues to speaker stance at all? And if they do, can we find evidence that listeners are rapidly relating these phonetic speaker stance cues to a more sophisticated situation-model representation of what is being said, instead of relying on simple, word-based associations (e.g., a happy sounding voice matching ‘award’). We explored this by presenting phonetically and semantically manipulated spoken Dutch sentences to listeners while collecting neural (ERP) measurements. The target materials consisted of utterances that contained a positive or negative content word. Utterances were phonetically manipulated using Praat’s LPC analysis and resynthesis to obtain a smiling and a frowning version of each utterance. We also varied the perspective of the sentence. The subject of the sentence was either first person singular – thus referring to the speaker -, or in third person singular – thus referring to someone else. This resulted in valence-matching realizations (positive – smiling, negative – frowning), or valence-mismatching realizations (negative – smiling, positive – frowning), either in first or in third person perspective. An example of a mismatching first person item would be ‘Ik heb een prijs gekregen’ (‘I have got a prize’) spoken with a frown. In general, we predict that listeners pick up on the audible cues to speaker stance. These cues should lead listeners to expect something of corresponding valence. When the speaker talks about himself, we predict to see a clear mismatch effect for mismatching words because both perceived expression and sentence-level meaning convey affective information about the speaker. For utterances referring to someone other than the speaker, no such clear mismatch effect will be observed, at least not with the same magnitude or direction. In these sentences a smiling expression should not typically lead listeners to expect something of positive valence because the speaker is not the subject of the event he describes. It depends on how the speaker feels about the whole event he is talking about, including the person he is referring to. Valence-mismatching words in speaker-centered sentences elicited a reliable positivity at about 600-900ms after critical word onset. When the event referred to someone else, valence-mismatching words elicited a reliable negativity at about 500-900ms after critical word onset. These results reveal that listeners can immediately detect if speaker-stance cues, supplied by frowning or smiling phonetics, match or mismatch the typical valence of the unfolding sentence. Furthermore, the results of the third person sentences reveal that listeners are rapidly relating these phonetic speaker stance cues to a more sophisticated situation-model representation of what is being said, instead of relying on simple associations. Thus, listeners rapidly integrate affective linguistic and affective paralinguistic cues into a sophisticated situation model (involving speaker, described event, and listener), which is presumably involved in subsequent responses. (1) Quene, H., Semin, G. R., & Foroni, F. (2012). Audible smiles and frowns affect speech comprehension. Speech Communication. (2) Ohala, J. J. 1980. The acoustic origin of the smile. J. Acoust. Soc. Am. 68.S33.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []