Hierarchically Attending Time-Frequency and Channel Features for Improving Speaker Verification

Chenglong Wang,Jiangyan Yi,Jianhua Tao,Ye Bai,Zhengkun Tian

Hierarchically Attending Time-Frequency and Channel Features for Improving Speaker Verification

2021

Chenglong Wang
Jiangyan Yi
Jianhua Tao
Ye Bai
Zhengkun Tian

Attention-based models have recently shown powerful representation learning ability in speaker recognition. However, most of the attention mechanism based models primarily focus on pooling layers. In this work, we present an end-to-end speaker verification system which leverage time-frequency and channel features hierarchically. To further improve system performance, we employ Large Margin Cosine Loss to optimize the model to determine the optimal loss function. We carry out experiments on the VoxCeleb1 datasets to evaluate the effectiveness of our methods. The results suggest that our best system outperforms the i-vector + PLDA and x-vector system by 53.3% and 7.6%, respectively.

Keywords:

Time–frequency analysis
Computer science
Communication channel
speaker verification
Speech recognition

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations