Improved phone recognition using excitation source features

2016 
Phone recognizers serve as the preprocessing unit for speech recognition systems and phonetic engines. Even though, most of the state of the art speech recognition achieve relatively better accuracy at the sentence level, the phone level recognition performance falls way below the sentence level performance. The increased recognition rates at the sentence levels are achieved with help of refined language models used for the language under consideration. Therefore, the objective of the present work is to improve the phoneme level accuracy of the hidden markov model(HMM) based acoustic phone models by combining excitation source features with the conventional mel frequency cepstral coefficients (MFCC) for American English. TIMIT and CMU Arctic database, is used for the experiments in the present work. The average spectral energy around the zero-frequency region of each frame is used as the excitation source feature to combine with the 13 MFCC features. The effectiveness of the phoneme recognition is confirmed by a 0.5% increase in the phone recognition accuracy against the state of the art HMM-GMM acoustic models with MFCC features.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []