Deep neural networks for small footprint text-dependent speaker verification

Ehsan Variani,Xin Lei,Erik McDermott,Ignacio Lopez-Moreno,Javier González Domínguez

Deep neural networks for small footprint text-dependent speaker verification

2014

In this paper we investigate the use of deep neural networks (DNNs) for a small footprint text-dependent speaker verification task. At development stage, a DNN is trained to classify speakers at the frame-level. During speaker enrollment, the trained DNN is used to extract speaker specific features from the last hidden layer. The average of these speaker features, or d-vector, is taken as the speaker model. At evaluation stage, a d-vector is extracted for each utterance and compared to the enrolled speaker model to make a verification decision. Experimental results show the DNN based speaker verification system achieves good performance compared to a popular i-vector system on a small footprint text-dependent speaker verification task. In addition, the DNN based system is more robust to additive noise and outperforms the i-vector system at low False Rejection operating points. Finally the combined system outperforms the i-vector system by 14% and 25% relative in equal error rate (EER) for clean and noisy conditions respectively.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

667

Citations