Review of different robust x-vector extractors for speaker verification
2021
Recently, the x-vector framework, extracted with deep neural network architectures, became the state-of-the-art method for speaker verification. Although another level of performance has been overcome with this approach, fine-tuning and optimizing the hyper-parameters of a deep neural network to obtain a robust x-vector extractor is cost- and time-consuming. Several approaches have been proposed to train robust x-vector extractors. In this paper, we propose to review and analyse the impact of the most significant x-vector related approaches, including variations in terms of data augmentation, number of epochs, size of mini-batch, acoustic features and frames per iteration. By applying these approaches to the default recipe provided in the Kaldi toolkit, we observed a significant relative gain of more than 50% in terms of EER on Speaker in the Wild and Voxceleb1-E datasets.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
20
References
2
Citations
NaN
KQI