Speaker Naming in TV programs Based on Speaker Role Recognition

Mohamed Lazhar Bellagha,Mounir Zrigui

Speaker Naming in TV programs Based on Speaker Role Recognition

2020

Automatic speaker naming in TV programs consists of identifying each speaking person by their real identity. This is a challenging task in the absence of a priori knowledge. It needs an accurate association of text and auditory modalities. Current speaker naming architectures cannot mention speakers when the names are absent, replacing them with a generic “someone” label. To address this challenge, we propose to integrate speaker role recognition in the naming process. In this paper we propose a multimodal deep neural network architecture that processes jointly the audio and text to identify the speaker's role, along with integrating speaker role recognition in a complete naming architecture with different strategies. We evaluate our model on the test part of the Arabic Multi-Genre Broadcast challenge dataset which consists of 17 TV programs from Aljazeera. Our experiments show that identifying speaker's role through voice and text can significantly improve the speaker naming results.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations