More Data Is Better Only to Some Level, After Which It Is Harmful: Profiling Neural Machine Translation Self-learning with Back-Translation.

2021 
Neural machine translation needs a very large volume of data to unfold its potential. Self-learning with back-translation became widely adopted to address this data scarceness bottleneck: a seed system is used to translate source monolingual sentences which are aligned with the output sentences to form a synthetic data set that, when used to retrain the system, improves its translation performance. In this paper we report on the profiling of the self-learning with back-translation aiming at clarifying whether adding more synthetic data always leads to an increase of performance. With the experiments undertaken, we gathered evidence indicating that more synthetic data is better only to some level, after which it is harmful as the translation quality decays.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    0
    Citations
    NaN
    KQI
    []