Improving ASR performance on PDA by contamination of training data

2004 
Automatic Speech Recognition (ASR) on Personal Digital Assistant (PDA) suffers from the intrinsic hardware characteristics of the audio interface, for example, low quality microphones and device internal noises. In this paper, we propose to compensate for these weaknesses by contaminating clean training data with the distortion sources that are specific to the target device. We present a method to estimate both the frequency response of the audio acquisition channel and the internal additive noise from a few tens of minutes of recordings on PDA. The channel characteristics are estimated from the longterm power spectra of clean speech and PDA recordings, while the noise power spectrum is estimated during silence segments in these recordings. All the recordings are performed in a controlled way, i.e. quiet environnement and no reverberation, in order to ensure that we measure only the internal device characteristics. The PDA-specific training data are then obtained by filtering the clean training data with the audio channel frequency response and contaminating them with internal noise, and a specific acoustic model is eventually trained for the target device. Recognition tests have been performed on digit sequences on three different PDA’s. Our approach has been compared to other channel and noise robust methods and presents very competitive performance.
    • Correction
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    0
    Citations
    NaN
    KQI
    []