Voice presentation attack detection using convolutional neural networks

2019 
Current state-of-the-art automatic speaker verification (ASV) systems are prone to spoofing. The security and reliability of ASV systems can be threatened by different types of spoofing attacks using voice conversion, synthetic speech, or recorded passphrase. It is therefore essential to develop countermeasure techniques which can detect such spoofed speech. Inspired by the success of deep learning approaches in various classification tasks, this work presents an in-depth study of convolutional neural networks (CNNs) for spoofing detection in automatic speaker verification (ASV) systems. Specifically, we have compared the use of three different CNNs architectures: AlexNet, CNNs with max-feature-map activation, and an ensemble of standard CNNs for developing spoofing countermeasures, and discussed their potential to avoid overfitting due to small amounts of training data that is usually available in this task. We used popular deep learning toolkits for the system implementation and have released the implementation code of our methods publicly. We have evaluated the proposed countermeasure systems for detecting replay attacks on recently released spoofing corpora ASVspoof 2017, and also provided in-depth visual analyses of CNNs to aid for future research in this area.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    62
    References
    1
    Citations
    NaN
    KQI
    []