Dual-channel VTS feature compensation for noise-robust speech recognition on mobile devices

2017 
One way to improve automatic speech recognition (ASR) performance on the latest mobile devices, which can be employed on a variety of noisy environments, consists of taking advantage of the small microphone arrays embedded in them. Since the performance of the classic beamforming techniques with small microphone arrays is rather limited, specific techniques are being developed to efficiently exploit this novel feature for noise-robust ASR purposes. In this study, a novel dual-channel minimum mean square error-based feature compensation method relying on a vector Taylor series (VTS) expansion of a dual-channel speech distortion model is proposed. In contrast to the single-channel VTS approach (which can be considered as the state-of-the-art for feature compensation), the authors’ technique particularly benefits from the spatial properties of speech and noise. Their proposal is assessed on a dual-microphone smartphone (a particular case of interest) by means of the AURORA2-2C synthetic corpus. Word recognition results, also validated with real noisy speech data, demonstrate the higher accuracy of their method by clearly outperforming minimum variance distortionless response beamforming and a single-channel VTS feature compensation approach, especially at low signal-to-noise ratios.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    1
    Citations
    NaN
    KQI
    []