Spectro-temporal features for noise-robust speech recognition using power-law nonlinearity and power-bias subtraction

2013 
Previous work has demonstrated that spectro-temporal Gabor features reduced word error rates for automatic speech recognition under noisy conditions. However, the features based on mel spectra were easily corrupted in the presence of noise or channel distortion. We have exploited an algorithm for power normalized cepstral coefficients (PNCCs) to generate a more robust spectro-temporal representation. We refer to it as power normalized spectrum (PNS), and to the corresponding output processed by Gabor filters and MLP nonlinear weighting as PNS-Gabor. We show that the proposed feature outperforms state-of-the-art noise-robust features, ETSI-AFE and PNCC for both Aurora2 and a noisy version of the Wall Street Jounal (WSJ) corpus. A comparison of the individual processing steps of mel spectra and PNS shows that power bias subtraction is the most important aspect of PNS-Gabor features to provide an improvement over Mel-Gabor features. The result indicates that Gabor processing compensates the limitation of PNCC for channels with frequency-shift characteristic. Overall, PNS-Gabor features decrease the word error rate by 32% relative to MFCC and 13% relative to PNCC in Aurora2. For noisy WSJ, they decrease the word error rate by 30.9% relative to MFCC and 24.7% relative to PNCC.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    15
    Citations
    NaN
    KQI
    []