Group-Level Focus of Visual Attention for Improved Active Speaker Detection

2021 
This work addresses the problem of active speaker detection in physically situated multiparty interactions. This challenge requires a robust solution that can perform effectively across a wide range of speakers and physical contexts. Current state-of-the-art active speaker detection approaches rely on machine learning methods that do not generalize well to new physical settings. We find that these methods do not transfer well even between similar datasets. We propose the use of group-level focus of visual attention in combination with a general audio-video synchronizer method for improved active speaker detection across speakers and physical contexts. Our dataset-independent experiments demonstrate that the proposed approach outperforms state-of-the-art methods trained specifically for the task of active speaker detection.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    0
    Citations
    NaN
    KQI
    []