Training, Enhancing, Evaluating and Using MT Systems with Comparable Data

Bogdan Babych,Yu Chen,Andreas Eisele,Sabine Hunsicker,Mārcis Pinnis,Inguna Skadiņa,Raivis Skadiņš,Gregor Thurmair,Andrejs Vasiļjevs,Mateja Verlic,Xiaojun Zhang

Training, Enhancing, Evaluating and Using MT Systems with Comparable Data

2019

This chapter describes how semi-parallel and parallel data extracted from comparable corpora can be used in enhancing machine translation (MT) systems: what are the methods used for this task in statistical and rule-based machine translation systems; what kinds of showcases exist that illustrate the usage of such enhanced MT systems. The impact of data extracted from comparable corpora on MT quality is evaluated for 17 language pairs, and detailed studies involving human evaluation are carried out for 11 language pairs. At first, baseline statistical machine translation (SMT) systems were built using traditional SMT techniques. Then they were improved by the integration of additional data extracted from the comparable corpora. Comparative evaluation was performed to measure improvements. Comparable corpora were also used to enrich the linguistic knowledge of rule-based machine translation (RBMT) systems by applying terminology extraction technology. Finally, SMT systems were adjusted for a narrow domain and included domain-specific knowledge such as terminology, named entities (NEs), domain-specific language models (LMs), etc.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations