ARoBERT: An ASR Robust Pre-Trained Language Model for Spoken Language Understanding

2022 
Spoken Language Understanding (SLU) aims to interpret the meanings of human speeches in order to support various human-machine interaction systems. A key technique for SLU is Automatic Speech Recognition (ASR), which transcribes speech signals into text contents. As the output texts of modern ASR systems unavoidably contain errors, mainstream SLU models either trained or tested on texts transcribed by ASR systems would not be sufficiently error robust. We present ARoBERT, an ASR Robust BERT model, which can be fine-tuned to solve a variety of SLU tasks with noisy inputs. To guarantee the robustness of ARoBERT, during pretraining, we decrease the fluctuations of language representations when some parts of the input texts are replaced by homophones or synophones. Specifically, we propose two novel self-supervised pre-training tasks for ARoBERT, namely Phonetically-aware Masked Language Modeling (PMLM) and ASR Model-adaptive Masked Language Modeling (AMMLM). The PMLM task explicitly fuses the knowledge of word phonetic similarities into the pre-training process, which forces homophones and synophones to share similar representations. In AMMLM, a data-driven algorithm is further introduced to mine typical ASR errors such that ARoBERT can tolerate ASR model errors. In the experiments, we evaluate ARoBERT over multiple datasets. The results show the superiority of ARoBERT, which consistently outperforms strong baselines. We have also shown that ARoBERT outperforms state-of-the-arts on a public benchmark. Currently, ARoBERT has been deployed in an online production system with significant improvements.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    44
    References
    0
    Citations
    NaN
    KQI
    []