Extracting the Translation of Anime Titles from Web Corpora Using CRF

2014 
Unknown words whose translation is not listed in general dictionaries, have been a problem in cross-language information retrieval and machine translation. Since the new terms are created one after the other, it is difficult to cover all such terms using general bilingual dictionaries. Therefore, researches on automatic extraction of translations for unknown words have been performed for the purpose of building a bilingual dictionary at low cost using Web corpora. In this paper, we focus on anime titles; they are commercially important, and propose a method to extract Japanese candidate translations corresponding to the English anime titles using Conditional Random Fields (CRF). We used transliteration features as well as features of bag of words, part of speech, and so on because we focused on the fact that when the Japanese anime titles were translated into English, they were transliterated in many cases. The experiments were performed using one hundred Web pages at most collected from the search engine, whose queries were Japanese-English anime title pairs extracted from Wikipedia. The results showed that the number of acquired titles significantly increased when the transliteration features were used.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    4
    References
    0
    Citations
    NaN
    KQI
    []