Enriching low resource Statistical Machine Translation using induced bilingual lexicons

Han Jingyi,Núria Bel Rafecas

Enriching low resource Statistical Machine Translation using induced bilingual lexicons

2017

Han Jingyi
Núria Bel Rafecas

In this work we present an experiment for enriching a Statistical Machine Translation (SMT) phrase table with automatically created bilingual word pairs. The bilingual lexicon is induced with a supervised classifier trained using a joint representation of word embeddings (WE) and Brown clusters (BC) of translation equivalent word pairs as features. The classifier reaches a 0.94 F-score and the MT experiment results show an improvement of up to +0.70 BLEU over a low resource Chinese-Spanish phrase-based SMT baseline, demonstrating that bad entries delivered by the classifier are well handled.

Keywords:

Computer science
Machine translation
Linguistics
Natural language processing
BLEU
Bilingual lexicon
Artificial intelligence
Classifier (linguistics)
Speech recognition
Phrase
low resource

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations