Combining I-vector and ResNet by Knowledge Distillation for Text-Independent Speaker Verification
2021
For years, i-vector + PLDA model has dominated the text-independent Speaker Verification task. Until recently, Deep Neural Networks and metric-learning method becomes popular. In this paper, we combine these two methods by Knowledge Distillation. First, we propose our residual neural network based on ResNet34 which is trained with AM-Softmax loss. Then MSE loss between ResNet embedding and i-vector is used to do distillation. Experiments on VoxCeleb1 dataset shows that proposed ResNet can outperform widely used i-vector and x-vector method. With only MSE loss, ResNet has better performance than i-vector indicating its good generalization ability. After combining these two losses, the results can be further improved and show that distilling knowledge from i-vector can improve performance of the model with PLDA backend.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
22
References
0
Citations
NaN
KQI