Deep neural networks for small footprint text-dependent speaker verification
2014
In this paper we investigate the use of deep neural networks (DNNs) for a small footprint text-dependent speaker verification task. At development stage, a DNN is trained to classify speakers at the frame-level. During speaker enrollment, the trained DNN is used to extract speaker specific features from the last hidden layer. The average of these speaker features, or d-vector, is taken as the speaker model. At evaluation stage, a d-vector is extracted for each utterance and compared to the enrolled speaker model to make a verification decision. Experimental results show the DNN based speaker verification system achieves good performance compared to a popular i-vector system on a small footprint text-dependent speaker verification task. In addition, the DNN based system is more robust to additive noise and outperforms the i-vector system at low False Rejection operating points. Finally the combined system outperforms the i-vector system by 14% and 25% relative in equal error rate (EER) for clean and noisy conditions respectively.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
24
References
667
Citations
NaN
KQI