Automatic Speech Recognition Using Missing Data Techniques: Handling of Real-World Data

2011 
In this chapter, we investigate the performance of a missing data recognizer on real-world speech from the SPEECON and SpeechDat-Car databases. In previous work we hypothesized that in real-world speech, which is corrupted not only by environmental noise, but also by speaker, reverberation and channel effects, the ‘reliable’ features do not match an acoustic model trained on clean speech. In a series of experiments, we investigate the validity of this hypothesis and explore to what extent performance can be improved by combining MDT with three conventional techniques, viz. multi-condition training, dereverberation and feature enhancement. Our results confirm our hypothesis and show that the mismatch can be reduced by multi-condition training of the acoustic models and feature enhancement, and that these effects combine to some degree. Our experiments with dereverberation reveal that reverberation can have a major impact on recognition performance, but that MDT with a suitable missing data mask is capable of compensating both the environmental noise as well as the reverberation at once.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    37
    References
    7
    Citations
    NaN
    KQI
    []