Developmental Learning of Audio-Visual Integration From Facial Gestures Of a Social Robot

2019 
We present a robot head with facial gestures, audio and vision capabilities toward the emergence of infant-like social features. For this, we propose a neural architecture that integrates these three modalities following a developmental stage with social interaction with a caregiver. During dyadic interaction with the experimenter, the robot learns to categorize audio-speech gestures of vowels /a/, /i/, /o/ as a baby would do it, by linking someone-else facial expressions to its own movements. We show that multimodal integration in the neural network is more robust than unimodal learning so that it compensates erroneous or noisy information coming from each modality. Therefore, facial mimicry with a partner can be reproduced using redundant audiovisual signals or noisy information from one modality only. Statistical experiments on 24 naive participants show the robustness of our algorithm during human-robot interactions in public environment where many people move and talk all the time. We then discuss our model in the light of human-robot communication, the development of social skills and language in infants.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []