Semiautomatic Acquisition of Translation Templates from Monolingual Unannotated Chinese Patent Corpus

2013 
We propose a data-driven, semiautomatic and unsupervised method, which can semiautomatically extract translation templates from the unannotated Chinese patent corpus. The method includes seven steps: morphological analysis, replace, fllter, cluster, merge, sort and edit. After extracting and preforming the preliminary templates, we manually edit them and then get the ultimate templates, which are used in a template-based machine translation system. The experimental results show that the method is efiective to improve the quality of machine translation, and that the template-based machine translation system outperforms the conventional rule-based machine translation system without templates.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    3
    Citations
    NaN
    KQI
    []