WeNet: Production First and Production Ready End-to-End Speech Recognition Toolkit.

Binbin Zhang,Di Wu,Chao Yang,Chen Xiaoyu,Zhendong Peng,Xiangming Wang,Zhuoyuan Yao,Xiong Wang,Fan Yu,Lei Xie,Xin Lei

WeNet: Production First and Production Ready End-to-End Speech Recognition Toolkit.

2021

In this paper, we present a new open source, production first and production ready end-to-end (E2E) speech recognition toolkit named WeNet. The main motivation of WeNet is to close the gap between the research and the production of E2E speech recognition models. WeNet provides an efficient way to ship ASR applications in several real-world scenarios, which is the main difference and advantage to other open source E2E speech recognition toolkits. This paper introduces WeNet from three aspects, including model architecture, framework design and performance metrics. Our experiments on AISHELL-1 using WeNet, not only give a promising character error rate (CER) on a unified streaming and non-streaming two pass (U2) E2E model but also show reasonable RTF and latency, both of these aspects are favored for production adoption. The toolkit is publicly available at https://github.com/mobvoi/wenet.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations