Automatic training set segmentation for multi-pass speech recognition

2005 
A common approach to automatic speech recognition uses two recognition passes to decode an utterance: the first pass limits the search to a smaller set of likely hypotheses; the second pass re-scores the limited set using more detailed acoustic models which may target gender or specific channels. A question raised by this architecture is how to define and train the second pass models. We describe an extensible automatic solution that requires no manual gender or channel labeling. To train the second pass models, we cluster the training data into datasets containing utterances whose acoustics are very similar across the entire utterance. The clustering is based on which regions of a more general acoustic model are activated during forced alignments. Experiments with commercial American-English digit strings show 9.3% relative error rate reduction over a gender-based two pass system with similar numbers of model parameters.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    3
    Citations
    NaN
    KQI
    []