Towards End-to-End Speech Recognition with Deep Multipath Convolutional Neural Networks.

2019 
Approaches to deep learning have been used all over in connection to Automatic Speech Recognition (ASR), where they have achieved a high level of accuracy. This has mostly been seen in Convolutional Neural Network (CNN) which has recently been investigated in ASR. Due to the fact that CNN has an increased network’s depth on one branch, and may not be wide enough to work on capturing adequate features on signals of human speech. We focus on a proposal for an architecture that is deep and wide in CNN referred to as Multipath Convolutional Neural Network (MCNN). MCNN-CTC combines three additional paths with Connectionist Temporal Classification (CTC) objective function, and can be defined as an end-to-end system that has the ability to fully exploit spectral and temporal structures related to speech signals simultaneously. Results from the experiments show that the newly proposed MCNN-CTC structure enables a reduction in the error rate arising from the construction of end-to-end acoustic model. In the absence of a Language Model (LM), our proposed MCNN-CTC acoustic model has a relative reduction of 1.10%–12.08% comparing to the traditional HMM-based or DCNN-CTC-based models with strong generalization performance.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    26
    References
    6
    Citations
    NaN
    KQI
    []