번역의 자동평가 : 기계번역 평가를 인간번역 평가에 적용해보기

2018 
‘Appropriate’ evaluation of translation quality requires three criteria: validity, reliability and practicality. The human translation evaluation can be considered as ‘valid’, if it is conducted by a professional evaluator, but it leaves out the other two criteria of evaluating human translation. Automatic evaluation, on the other hand, fulfills the second and the third, but misses the first criteria. It would be therefore, desirable, if both methods could complement each other. This paper attempts to verify this possibility. Using an online tool (interactive BLEU score evaluator), 150 pairs of translations were automatically evaluated and compared to the ratings of a human evaluator (professional translator and evaluator). The results showed only a slight inter-rater agreement (Kappa = .054, .109), but made us some implicit suggestions about the improvement of Kappa score. This can be increased, if calculations are based on 1-gram matches(or even morphemes), instead of 1-4 gram matches, and if the reference translations contain a certain quantity of synonyms. With regard to the validity of BLEU, Brevity Penalty should be substituted by a Length Penalty (Han et al., 2012) accordingly to the characteristics of human translation.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []