Parallel text alignment using crosslingual information retrieval techniques

2000 
In this chapter, we demonstrate that aligning a sentence with its translation is not fundamentally different from finding a sentence on the same topic in the target corpus, using the source sentence as a query. The two processes are based on the semantic proximity of two sentences in different languages, and their major difference is that information retrieval only needs to insure that the sentence found contains most of the information of the query, whereas sentence alignment requires that the parts that are not common to both languages be as small as possible. A crosslingual query system can be used to obtain candidates for sentence alignment, provided that the measure of semantic proximity slightly modified. More classical techniques can be used, taking sequential order into account, but our approach is very robust to text desynchronization, such as missing text segments in one language, or texts such as glossaries or indexes that are not in the same order in different languages.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    10
    Citations
    NaN
    KQI
    []