Improving Translation Lexicon Induction from Monolingual Corpora via Dependency Contexts and Part-of-Speech Equivalences

Nikesh Garera,Chris Callison-Burch,David Yarowsky

Improving Translation Lexicon Induction from Monolingual Corpora via Dependency Contexts and Part-of-Speech Equivalences

2009

Nikesh Garera
Chris Callison-Burch
David Yarowsky

This paper presents novel improvements to the induction of translation lexicons from monolingual corpora using multilingual dependency parses. We introduce a dependency-based context model that incorporates long-range dependencies, variable context sizes, and reordering. It provides a 16% relative improvement over the baseline approach that uses a fixed context window of adjacent words. Its Top 10 accuracy for noun translation is higher than that of a statistical translation model trained on a Spanish-English parallel corpus containing 100,000 sentence pairs. We generalize the evaluation to other word-types, and show that the performance can be increased to 18% relative by preserving part-of-speech equivalencies during translation.

Keywords:

Dependency grammar
Artificial intelligence
Natural language processing
Computer science
Lexicon
Noun
Part of speech
Context model
Sentence
Speech recognition
context window

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations