Review of different robust x-vector extractors for speaker verification

Mickael Rouvier,Richard Dufour,Pierre-Michel Bousquet

Review of different robust x-vector extractors for speaker verification

2021

Recently, the x-vector framework, extracted with deep neural network architectures, became the state-of-the-art method for speaker verification. Although another level of performance has been overcome with this approach, fine-tuning and optimizing the hyper-parameters of a deep neural network to obtain a robust x-vector extractor is cost- and time-consuming. Several approaches have been proposed to train robust x-vector extractors. In this paper, we propose to review and analyse the impact of the most significant x-vector related approaches, including variations in terms of data augmentation, number of epochs, size of mini-batch, acoustic features and frames per iteration. By applying these approaches to the default recipe provided in the Kaldi toolkit, we observed a significant relative gain of more than 50% in terms of EER on Speaker in the Wild and Voxceleb1-E datasets.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations