End-to-End Spoken Language Understanding using RNN-Transducer ASR.

Anirudh Raju,Gautam Tiwari,Milind Rao,Pranav Dheram,Bryan Anderson,Zhe Zhang,Bach Bui,Ariya Rastrow

End-to-End Spoken Language Understanding using RNN-Transducer ASR.

2021

We propose an end-to-end trained spoken language understanding (SLU) system that extracts transcripts, intents and slots from an input speech utterance. It consists of a streaming recurrent neural network transducer (RNNT) based automatic speech recognition (ASR) model connected to a neural natural language understanding (NLU) model through a neural interface. This interface allows for end-to-end training using multi-task RNNT and NLU losses. Additionally, we introduce semantic sequence loss training for the joint RNNT-NLU system that allows direct optimization of non-differentiable SLU metrics. This end-to-end SLU model paradigm can leverage state-of-the-art advancements and pretrained models in both ASR and NLU research communities, outperforming recently proposed direct speech-to-semantics models, and conventional pipelined ASR and NLU systems. We show that this method improves both ASR and NLU metrics on both public SLU datasets and large proprietary datasets.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations