Addressing data sparsity for neural machine translation between morphologically rich languages

Mercedes García Martínez,Walid Aransa,Fethi Bougares,Loïc Barrault

Addressing data sparsity for neural machine translation between morphologically rich languages

2020

Mercedes García Martínez
Walid Aransa
Fethi Bougares
Loïc Barrault

Translating between morphologically rich languages is still challenging for current machine translation systems. In this paper, we experiment with various neural machine translation (NMT) architectures to address the data sparsity problem caused by data availability (quantity), domain shift and the languages involved (Arabic and French). We show that the Factored NMT (FNMT) model, which uses linguistically motivated factors, is able to outperform standard NMT systems using subword units by more than 1 BLEU point even when a large quantity of data is available. Our work shows the benefits of applying linguistic factors in NMT when faced with low- and high-resource conditions.

Keywords:

Natural language processing
Arabic
Machine translation
Artificial intelligence
Computer science
Computational linguistics
Data availability

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations