Combining I-vector and ResNet by Knowledge Distillation for Text-Independent Speaker Verification

2021 
For years, i-vector + PLDA model has dominated the text-independent Speaker Verification task. Until recently, Deep Neural Networks and metric-learning method becomes popular. In this paper, we combine these two methods by Knowledge Distillation. First, we propose our residual neural network based on ResNet34 which is trained with AM-Softmax loss. Then MSE loss between ResNet embedding and i-vector is used to do distillation. Experiments on VoxCeleb1 dataset shows that proposed ResNet can outperform widely used i-vector and x-vector method. With only MSE loss, ResNet has better performance than i-vector indicating its good generalization ability. After combining these two losses, the results can be further improved and show that distilling knowledge from i-vector can improve performance of the model with PLDA backend.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    0
    Citations
    NaN
    KQI
    []