The SRC-B speech-to-text systems for OC16 Chinese-English mix-ASR challenge

2016 
In the paper, we describe our Chinese-English Speech-to-Text (STT) systems for OC16 Chinese-English Mix-ASR Challenge. This challenge focuses on the transcription of Chinese-English code-mixing speech. Code-mixing speech which refers to the intra-sentential switching of two different languages in a spoken utterance proposes a very challenging task for the ASR research community. In this challenge, our setup includes systems trained using the Kaldi speech recognition toolkit. We combined the outputs from different systems using ROVER to achieve a good overall performance. The individual subsystems are built by using different front-ends (e.g., MFCC+pitch, Fbank+pitch or FMLLR features), acoustic models (CD-GMM-HMM, CD-DNN-HMM or CD-LSTM-HMM), language models (with or without interpolated using development set text), phone sets and by training on different sets of permissible training data. The ROVER combination setup produces a final hypothesis. Our best system (CD-DNN-HMM-MPE) based on Kaldi has a CER of 4.79% on the development set using interpolated LM while in combination with CD-LSTM-HMM system we get 4.98% CER on the development set.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    0
    Citations
    NaN
    KQI
    []