Low-Frequency Character Clustering for End-to-End ASR System

Hitoshi Ito,Aiko Hagiwara,Manon Ichiki,Takeshi Kobayakawa,Takeshi Mishima,Shoei Sato,Akio Kobayashi

Low-Frequency Character Clustering for End-to-End ASR System

2018

We developed a label-designing and restoration method for end-to-end automatic speech recognition based on connectionist temporal classification (CTC). With an end-to-end speech-recognition system including thousands of output labels such as words or characters, it is difficult to train a robust model because of data sparsity. With our proposed method, characters with less training data are estimated using the context of a language model rather than the acoustic features. Our method involves two steps. First, we train acoustic models using 70 class labels instead of thousands of low-frequency labels. Second, the class labels are restored to the original labels by using a weighted finite state transducer and n-gram language model. We applied the proposed method to a Japanese end-to-end automatic speech-recognition system including labels of over 3,000 characters. Experimental results indicate that the word error rate relatively improved with our method by a maximum of 15.5% compared with a conventional CTC-based method and is comparable to state-of-the-art hybrid DNN methods.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations