Low frequency frame-wise normalization over constant-Q transform for playback speech detection

2019 
Abstract The playback speech contains information from the environment, playback and recorder used. This work focuses on proposal of a novel normalization scheme, namely, low frequency frame-wise normalization (LFFN) as one of the modules in feature extraction process that is hypothesized to help in capturing the artifacts from the playback speech. It is based on low frequency bin processing that is performed frame-wise and hence its name. The constant-Q transform (CQT) based features are found to provide the benchmark results for detection of spoofing attacks. In this work, LFFN is combined with CQT to extract two new features from octave and linear power spectra, respectively. The first one is obtained by CQT, LFFN and octave segmentation that is referred to as constant-Q normalization segmentation coefficients (CQNSC). The latter uses conventional constant-Q cepstral coefficient (CQCC) and LFFN to obtain constant-Q normalization cepstral coefficients (CQNCC). The studies are performed on ASVspoof 2017 version 2.0 corpus that is designed for studying playback speech detection. The experimental results show the effectiveness of proposed LFFN with CQT based features. We obtain equal error rate of 10.63% and 10.31% for CQNSC and CQNCC features on the evaluation set of ASVspoof 2017 version 2.0 corpus, respectively.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    56
    References
    11
    Citations
    NaN
    KQI
    []