Improving Unsupervised Word-by-Word Translation with Language Model and Denoising Autoencoder.

Yunsu Kim,Jiahui Geng,Hermann Ney

Improving Unsupervised Word-by-Word Translation with Language Model and Denoising Autoencoder.

2019

Yunsu Kim
Jiahui Geng
Hermann Ney

Unsupervised learning of cross-lingual word embedding offers elegant matching of words across languages, but has fundamental limitations in translating sentences. In this paper, we propose simple yet effective methods to improve word-by-word translation of cross-lingual embeddings, using only monolingual corpora but without any back-translation. We integrate a language model for context-aware search, and use a novel denoising autoencoder to handle reordering. Our system surpasses state-of-the-art unsupervised neural translation systems without costly iterative training. We also analyze the effect of vocabulary size and denoising type on the translation performance, which provides better understanding of learning the cross-lingual word embedding and its usage in translation.

Keywords:

Natural language processing
Artificial intelligence
Word embedding
Autoencoder
Unsupervised learning
Noise reduction
Computer science
Language model
Vocabulary
denoising autoencoder

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations