Audio-visual cues distinguishing self- from system-directed speech in younger and older adults

2005 
In spite of interest in developing robust open-microphone engagement techniques for mobile use and natural field contexts, there currently are no reliable techniques available. One problem is the lack of empirically-grounded models as guidance for distinguishing how users' audio-visual activity actually differs systematically when addressing a computer versus human partner. In particular, existing techniques have not been designed to handle high levels of user self talk as a source of "noise," and they typically assume that a user is addressing the system only when facing it while speaking. In the present research, data were collected during two related studies in which adults aged 18-89 interacted multimodally using speech and pen with a simulated map system. Results revealed that people engaged in self talk prior to addressing the system over 30% of the time, with no decrease in younger adults' rate of self talk compared with elders. Speakers' amplitude was lower during 96% of their self talk, with a substantial 26 dBr amplitude separation observed between self- and system-directed speech. The magnitude of speaker's amplitude separation ranged from approximately 10-60 dBr and diminished with age, with 79% of the variance predictable simply by knowing a person's age. In contrast to the clear differentiation of intended addressee revealed by amplitude separation, gaze at the system was not a reliable indicator of speech directed to the system, with users looking at the system over 98% of the time during both self- and system-directed speech. Results of this research have implications for the design of more effective open-microphone engagement for mobile and pervasive systems.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    15
    Citations
    NaN
    KQI
    []