PatchBERT: Just-in-Time, Out-of-Vocabulary Patching

Sangwhan Moon,Naoaki Okazaki

PatchBERT: Just-in-Time, Out-of-Vocabulary Patching

2020

Sangwhan Moon
Naoaki Okazaki

Large scale pre-trained language models have shown groundbreaking performance improvements for transfer learning in the domain of natural language processing. In our paper, we study a pre-trained multilingual BERT model and analyze the OOV rate on downstream tasks, how it introduces information loss, and as a side-effect, obstructs the potential of the underlying model. We then propose multiple approaches for mitigation and demonstrate that it improves performance with the same parameter count when combined with fine-tuning.

Keywords:

Language model
Vocabulary
Transfer of learning
Artificial intelligence
Computer science
Natural language processing
Time-out
information loss

Correction
Cite
Save
Machine Reading By IdeaReader

References

Citations