Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model

Anjuli Kannan,Arindrima Datta,Tara N. Sainath,Eugene Weinstein,Bhuvana Ramabhadran,Yonghui Wu,Ankur Bapna,Zhifeng Chen,Seungji Lee

Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model

2019

A method (400) of transcribing speech using a multilingual end-to-end (E2E) speech recognition model (115) includes receiving audio data (110) for an utterance (106) spoken in a particular native language, obtaining a language vector (115) identifying the particular language, and processing, using the multilingual E2E speech recognition model, the language vector and acoustic features (117) derived from the audio data to generate a transcription (1:20) for the utterance. The multilingual E2E speech recognition model includes a plurality of language-specific adaptor modules (300) that include one or more adaptor modules specific to the particular native language and one or more other adaptor modules specific to at least one other native language different than the particular native language. The method also includes providing the transcription for output.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations