Multichannel Audio Front-End for Far-Field Automatic Speech Recognition

Amit Singh Chhetri,Philip Ryan Hilmes,Trausti Kristjansson,Wai Chu,Mohamed Mansour,Xiaoxue Li,Xianxian Zhang

Multichannel Audio Front-End for Far-Field Automatic Speech Recognition

2018

Amit Singh Chhetri
Philip Ryan Hilmes
Trausti Kristjansson
Wai Chu
Mohamed Mansour
Xiaoxue Li
Xianxian Zhang

Far-field automatic speech recognition (ASR) is a key enabling technology that allows untethered and natural voice interaction between users and Amazon Echo family of products. A key component in realizing far-field ASR on these products is the suite of audio front-end (AFE) algorithms that helps in mitigating acoustic environmental challenges and thereby improving the ASR performance. In this paper, we discuss the key algorithms within the AFE, and we provide insights into how these algorithms help in mitigating the various acoustical challenges for far-field processing. We also provide insights into the audio algorithm architecture adopted for the AFE, and we discuss ongoing and future research.

Keywords:

Speech recognition
Front and back ends
Architecture
Computer science
Near and far field
Suite
signal processing algorithms

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations