Group-Level Focus of Visual Attention for Improved Active Speaker Detection

Christopher Birmingham,Maja Mataric,Kalin Stefanov

Group-Level Focus of Visual Attention for Improved Active Speaker Detection

2021

Christopher Birmingham
Maja Mataric
Kalin Stefanov

This work addresses the problem of active speaker detection in physically situated multiparty interactions. This challenge requires a robust solution that can perform effectively across a wide range of speakers and physical contexts. Current state-of-the-art active speaker detection approaches rely on machine learning methods that do not generalize well to new physical settings. We find that these methods do not transfer well even between similar datasets. We propose the use of group-level focus of visual attention in combination with a general audio-video synchronizer method for improved active speaker detection across speakers and physical contexts. Our dataset-independent experiments demonstrate that the proposed approach outperforms state-of-the-art methods trained specifically for the task of active speaker detection.

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations