The Contribution of Low Frequencies to Multilingual Sub-sentential Alignment: a Differential Associative Approach

2011 
The goal of this paper is to show that, contrary to preconceived ideas, one can efficiently take advantage of low frequency words in natural language processing. We put them to use in sub-sentential alignment, which constitutes the first step of most data-driven machine translation systems (statistical or example-based machine translation). We show that rare words can be used as a foundation in the design of a multilingual sub-sentential alignment method, using differential techniques similar to those found in example-based machine translation. This method is truly multilingual, in that it allows the simultaneous processing of any number of languages. Moreover, it is very simple, anytime, and scales up naturally. We compare our implementation, Anymalign, with two statistical tools proven in the domain. Although its current results are on average slightly behind those of state of the art methods in phrase-based statistical machine translation, we show that the intrinsic quality of our lexicons is actually superior to that of lexicons produced by state of the art methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    17
    Citations
    NaN
    KQI
    []