Adaptations of ROUGE and BLEU to Better Evaluate Machine Reading Comprehension Task.

An Yang,Kai Liu,Jing Liu,Yajuan Lyu,Sujian Li

Adaptations of ROUGE and BLEU to Better Evaluate Machine Reading Comprehension Task.

2018

An Yang
Kai Liu
Jing Liu
Yajuan Lyu
Sujian Li

Current evaluation metrics to question answering based machine reading comprehension (MRC) systems generally focus on the lexical overlap between candidate and reference answers, such as ROUGE and BLEU. However, bias may appear when these metrics are used for specific question types, especially questions inquiring yes-no opinions and entity lists. In this paper, we make adaptations on the metrics to better correlate n-gram overlap with the human judgment for answers to these two question types. Statistical analysis proves the effectiveness of our approach. Our adaptations may provide positive guidance for the development of real-scene MRC systems.

Keywords:

Natural language processing
BLEU
Artificial intelligence
Computer science
Reading comprehension
ROUGE
machine reading
Comprehension

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations