Making the Most of Synthetic Parallel Texts: Portuguese-Chinese Neural Machine Translation Enhanced with Back-Translation
2020
The generation of synthetic parallel corpora through the automatic translation of a monolingual text, a process known as back-translation, is a technique used to augment the amount of parallel data available for training Machine Translation systems and is known to improve translation quality and thus mitigate the lack of data for under-resourced language pairs. It is assumed that, when training on synthetic parallel data, the original monolingual data should be used at the target side and its translation at the source side, an assumption to be assessed. The contributions of this paper are twofold. We investigate the viability of using synthetic data to improve Neural Machine Translation for Portuguese-Chinese, an under-resourced pair of languages for which back-translation has yet to demonstrate its suitability. Besides, we seek to fill another gap in the literature by experimenting with synthetic data not only at the source side but also, alternatively, at the target side. While demonstrating that, when appropriately applied, back-translation can enhance Portuguese-Chinese Neural Machine Translation, the results reported in this paper also confirm the current assumption that using the original monolingual data at the source side outperforms using them at the target side.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
26
References
0
Citations
NaN
KQI