A Convenient and Extensible Offline Chinese Speech Recognition System Based on Convolutional CTC Networks
2019
Deep learning methods have been widely used in automatic speech recognition (ASR), and they have achieved significant improvement in accuracy. The deep CNN structure can significantly improve the performance of the HMM speech recognition system. In addition, the CNN model has a good translation invariance in the time-frequency domain, making the model more robust (noise resistance). In this paper, we use the acoustic model based on CNN+CTC+Self-Attention and the corresponding language model to construct an end-to-end Chinese speech recognition system as a pre-training model. On this basis, we do not repeat the training of acoustic models. We propose a method combining Levenshtein Distance and hashing method to construct an off-line Chinese speech recognition system for a specific scene. The experimental results show that using the deep convolution CTC (Connectionist Temporal Classification) time series automatic speech recognition model, we have achieved a total error rate (WER) of 18% on the standard data set THRHS-30 and Free ST Chinese Mandarin Corpus. In addition, the combination of Levenshtein Distance and hash language model can achieve an accuracy of more than 90% on specific phrases. The whole model has strong expansibility and practicability.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
15
References
1
Citations
NaN
KQI