Depthwise Separable Convolutions for Short Utterance Speaker Identification

2019 
At present, the traditional speaker identification methods cannot perform well under short test speech conditions. In addition, speaker identification based on deep learning often requires a large amount of data for training when the speaker set is relatively large. In order to solve these two issues, this paper presented a novel deep convolutional neural networks (CNN) model based on depthwise separable convolutions for text-independent speaker identification in case of limited training speech and short test utterances. The CNN model automatically learned the deep features of the speaker from the spectrogram generated by the speech segment. In the case of taking 1-min speech from each speaker for training, the proposed model achieved the highest recognition rate compared with the other two classic models and obtained the accuracy of 99.08% on the test sets consisting of 4s speech segments. In addition, we proposed a method to expand effective region of spectrogram generated by test speech less than 4s, which generated a new spectrum by copying the effective part of the original spectrum to the blank part. The proposed speaker identification system significantly improved the classification effect of short utterances (< 4s) by using this method.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    2
    Citations
    NaN
    KQI
    []