Data Augmentation for Voice-Assistant NLU using BERT-based Interchangeable Rephrase

Akhila Yerukola,Mason Bretan,Hongxia Jin

Data Augmentation for Voice-Assistant NLU using BERT-based Interchangeable Rephrase

2021

Akhila Yerukola
Mason Bretan
Hongxia Jin

We introduce a data augmentation technique based on byte pair encoding and a BERT-like self-attention model to boost performance on spoken language understanding tasks. We compare and evaluate this method with a range of augmentation techniques encompassing generative models such as VAEs and performance-boosting techniques such as synonym replacement and back-translation. We show our method performs strongly on domain and intent classification tasks for a voice assistant and in a user-study focused on utterance naturalness and semantic similarity.

Keywords:

Natural language processing
Domain (software engineering)
Utterance
Computer science
Semantic similarity
Naturalness
Byte pair encoding
Synonym (database)
Spoken language
Generative grammar
Artificial intelligence

Correction
Cite
Save
Machine Reading By IdeaReader

References

Citations