Combining I-vector and ResNet by Knowledge Distillation for Text-Independent Speaker Verification

Zixuan Shao

Combining I-vector and ResNet by Knowledge Distillation for Text-Independent Speaker Verification

2021

Zixuan Shao

For years, i-vector + PLDA model has dominated the text-independent Speaker Verification task. Until recently, Deep Neural Networks and metric-learning method becomes popular. In this paper, we combine these two methods by Knowledge Distillation. First, we propose our residual neural network based on ResNet34 which is trained with AM-Softmax loss. Then MSE loss between ResNet embedding and i-vector is used to do distillation. Experiments on VoxCeleb1 dataset shows that proposed ResNet can outperform widely used i-vector and x-vector method. With only MSE loss, ResNet has better performance than i-vector indicating its good generalization ability. After combining these two losses, the results can be further improved and show that distilling knowledge from i-vector can improve performance of the model with PLDA backend.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations