ASR n-best fusion nets

Xinyue Liu,Mingda Li,Luoxin Chen,Prashan Wanigasekara,Weitong Ruan,Haidar Khan,Wael Hamza,Chengwei Su

ASR n-best fusion nets

2021

Xinyue Liu
Mingda Li
Luoxin Chen
Prashan Wanigasekara
Weitong Ruan
Haidar Khan
Wael Hamza
Chengwei Su

Current spoken language understanding systems heavily rely on the best hypothesis (ASR 1-best) generated by automatic speech recognition, which is used as the input for downstream models such as natural language understanding (NLU) modules. However, the potential errors and misrecognition in ASR 1-best raise challenges to NLU. It is usually difficult for NLU models to recover from ASR errors without additional signals, which leads to suboptimal SLU performance. This paper proposes a fusion network to jointly consider ASR n-best hypotheses for enhanced robustness to ASR errors. Our experiments on Alexa data show that our model achieved 21.71% error reduction compared to baseline trained on transcription for domain classification.

Keywords:

Speech recognition
Fusion
Computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations