GigaBERT: A Bilingual BERT for English and Arabic

Wuwei Lan,Yang Chen,Wei Xu,Alan Ritter

GigaBERT: A Bilingual BERT for English and Arabic

2020

Arabic is a morphological rich language, posing many challenges for information extraction (IE) tasks, including Named Entity Recognition (NER), Part-of-Speech tagging (POS), Argument Role Labeling (ARL), and Relation Extraction (RE). A few multilingual pre-trained models have been proposed and show good performance for Arabic, however, most experiment results are reported on language understanding tasks, such as natural language inference, question answering and sentiment analysis. Their performance on the IE tasks is less known, in particular, the cross-lingual transfer capability from English to Arabic. In this work, we pre-train a Gigaword-based bilingual language model (GigaBERT) to study these two distant languages as well as zero-short transfer learning on various IE tasks. Our GigaBERT outperforms multilingual BERT and and monolingual AraBERT on these tasks, in both supervised and zero-shot learning settings.\footnote{We have made our pre-trained models publicly available at this https URL

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations