RNN-T For Latency Controlled ASR With Improved Beam Search

Mahaveer Jain,Kjell Schubert,Jay Mahadeokar,Ching-Feng Yeh,Kaustubh Kalgaonkar,Anuroop Sriram,Christian Fuegen,Michael L. Seltzer

RNN-T For Latency Controlled ASR With Improved Beam Search

2019

Mahaveer Jain
Kjell Schubert
Jay Mahadeokar
Ching-Feng Yeh
Kaustubh Kalgaonkar
Anuroop Sriram
Christian Fuegen
Michael L. Seltzer

Neural transducer-based systems such as RNN Transducers (RNN-T) for automatic speech recognition (ASR) blend the individual components of a traditional hybrid ASR systems (acoustic model, language model, punctuation model, inverse text normalization) into one single model. This greatly simplifies training and inference and hence makes RNN-T a desirable choice for ASR systems. In this work, we investigate use of RNN-T in applications that require a tune-able latency budget during inference time. We also improved the decoding speed of the originally proposed RNN-T beam search algorithm. We evaluated our proposed system on English videos ASR dataset and show that neural RNN-T models can achieve comparable WER and better computational efficiency compared to a well tuned hybrid ASR baseline.

Keywords:

Inference
Acoustic model
Speech recognition
Machine learning
Transducer
Decoding methods
Artificial intelligence
Language model
Text normalization
Beam search
Computer science
Latency (engineering)

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations