BOOSTING AUTOMATIC SPEECH RECOGNITION THROUGH ARTICULATORY INVERSION

2012 
This paper explores whether articulatory features predicted from speech acoustics through inversion may be used to boost the recognition of context-dependent units when combined with acoustic features. For this purpose, we performed articulatory inversion on a corpus containing acoustic and electromagnetic articulography recordings from a single speaker. We then compared the performance of an HMM-based diphone classifier on the individual feature sets (acoustic, articulatory, inversion) as well as on their combinations. To make good use of the limited corpus, we used a factorized representation that first classified diphones into broad overlapping categories and then combined them using a maximum-a-posteriori criterion. When comparing the individual feature sets, our results show no degradation in classification performance when predicted articulators are used instead of ground-truth articulators. Further, performance on the acoustic feature set improved by 10% when adding ground-truth articulators and by 5% when adding predicted articulators.
    • Correction
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    0
    Citations
    NaN
    KQI
    []