Detecting Spoofed Speeches via Segment-Based Word CQCC and Average ZCR for Embedded Systems

2022 
Intelligent speech recognition is increasingly used in embedded systems, which is also seriously threatened by malicious speech spoofing attacks. Different from the conventional methods, this article proposes a segment-based anti-spoofing detection (SASD) method for the quick detection of spoofed speeches against embedded speech recognition, which focuses on the anti-spoofing features rather than the contexts of speeches and the voiceprints of speakers. The speeches are divided into word segments and silent segments. Based on constant $Q$ cepstral coefficients (CQCCs), a word CQCC (WCQCC) extraction is first designed for the word segments of speeches. Then, based on short-term zero crossing rate (ZCR), an average ZCR (AZCR) extraction is devised for the silent segments. Combining the WCQCC of word segments and AZCR of silent segments, a biased decision strategy is proposed to quickly determine whether a speech is spoofed. Based on ASVspoof 2021 datasets, extensive experiments are conducted to evaluate the effectiveness of the proposed method. Specifically, our SASD can improve the accuracy of anti-spoofing detection by up to 33.47% and save up to 69.10% of time overhead on embedded devices compared with the existing methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    54
    References
    0
    Citations
    NaN
    KQI
    []