Energy Separation Based Features for Replay Spoof Detection for Voice Assistant

2021 
Voice Assistant (VA) now-a-days plays a very important role for the smart home applications. However, the VA along with ease also brings security issue too, such as possibility of being attacked by replay, hidden voice commands, etc. This paper presents replay Spoof Speech Detection (SSD) system for VA using Energy Separation Algorithm (ESA)-based features to capture Instantaneous Amplitude and Frequency Cepstral Coefficients (i.e., ESA-IACC and ESA-IFCC), and Gaussian Mixture Model (GMM) as a pattern classifier. Teager Energy Operator (TEO) has the characteristics to suppress the noise and hence, it is robust to noise sensitivity. For noisy acoustic environment, the ESA-based features that employ TEO perform well compared to the clean environment. We performed the experiments on the ReMASC database, which contains four different acoustic environments. Proposed features performed better in clean and noisy environments. In addition, to obtain possible complementary information, we performed score-level fusion of ESA-IACC and ESA-IFCC that resulted in low Equal Error Rate (EER) for different environments. Furthermore, we compared our proposed feature sets with Constant-Q Cepstral Coefficients (CQCC), and Linear Frequency Cepstral Coefficients (LFCC) resulting in an relative improvement of approximately 21.88 % for clean environments and 66.34 % for noisy environments (in EER), respectively.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    32
    References
    1
    Citations
    NaN
    KQI
    []