Semiautomatic Acquisition of Translation Templates from Monolingual Unannotated Chinese Patent Corpus

Dechun Yin,Dakui Zhang

Semiautomatic Acquisition of Translation Templates from Monolingual Unannotated Chinese Patent Corpus

2013

Dechun Yin
Dakui Zhang

We propose a data-driven, semiautomatic and unsupervised method, which can semiautomatically extract translation templates from the unannotated Chinese patent corpus. The method includes seven steps: morphological analysis, replace, fllter, cluster, merge, sort and edit. After extracting and preforming the preliminary templates, we manually edit them and then get the ultimate templates, which are used in a template-based machine translation system. The experimental results show that the method is efiective to improve the quality of machine translation, and that the template-based machine translation system outperforms the conventional rule-based machine translation system without templates.

Keywords:

Machine translation
sort
Natural language processing
Template
Merge (version control)
Artificial intelligence
Pattern recognition
Rule-based machine translation
Computer science
machine translation system
Speech recognition
chinese patent

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations