Deep neural networks for small footprint text-dependent speaker verification

2014 
In this paper we investigate the use of deep neural networks (DNNs) for a small footprint text-dependent speaker verification task. At development stage, a DNN is trained to classify speakers at the frame-level. During speaker enrollment, the trained DNN is used to extract speaker specific features from the last hidden layer. The average of these speaker features, or d-vector, is taken as the speaker model. At evaluation stage, a d-vector is extracted for each utterance and compared to the enrolled speaker model to make a verification decision. Experimental results show the DNN based speaker verification system achieves good performance compared to a popular i-vector system on a small footprint text-dependent speaker verification task. In addition, the DNN based system is more robust to additive noise and outperforms the i-vector system at low False Rejection operating points. Finally the combined system outperforms the i-vector system by 14% and 25% relative in equal error rate (EER) for clean and noisy conditions respectively.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    667
    Citations
    NaN
    KQI
    []