Speaker Naming in TV programs Based on Speaker Role Recognition

2020 
Automatic speaker naming in TV programs consists of identifying each speaking person by their real identity. This is a challenging task in the absence of a priori knowledge. It needs an accurate association of text and auditory modalities. Current speaker naming architectures cannot mention speakers when the names are absent, replacing them with a generic “someone” label. To address this challenge, we propose to integrate speaker role recognition in the naming process. In this paper we propose a multimodal deep neural network architecture that processes jointly the audio and text to identify the speaker's role, along with integrating speaker role recognition in a complete naming architecture with different strategies. We evaluate our model on the test part of the Arabic Multi-Genre Broadcast challenge dataset which consists of 17 TV programs from Aljazeera. Our experiments show that identifying speaker's role through voice and text can significantly improve the speaker naming results.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    33
    References
    2
    Citations
    NaN
    KQI
    []