Investigating Deep Neural Networks for Speaker Diarization in the DIHARD Challenge

Ivan Himawan,Hafizur Rahman,Sridha Sridharan,Clinton Fookes,Ahilan Kanagasundaram

Investigating Deep Neural Networks for Speaker Diarization in the DIHARD Challenge

2018

Ivan Himawan
Hafizur Rahman
Sridha Sridharan
Clinton Fookes
Ahilan Kanagasundaram

We investigate the use of deep neural networks (DNNs) for the speaker diarization task to improve performance under domain mismatched conditions. Three unsupervised domain adaptation techniques, namely inter-dataset variability compensation (IDVC), domain-invariant covariance normalization (DICN), and domain mismatch modeling (DMM), are applied on DNN based speaker embeddings to compensate for the mismatch in the embedding subspace. We present results conducted on the DIHARD data, which was released for the 2018 diarization challenge. Collected from a diverse set of domains, this data provides very challenging domain mismatched conditions for the diarization task. Our results provide insights into how the performance of our proposed system could be further improved.

Keywords:

Speech recognition
Computer science
Pattern recognition
Speaker diarisation
Artificial intelligence
Artificial neural network
Task analysis
Embedding
Feature extraction
Normalization (statistics)
Subspace topology
Hidden Markov model

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations