Human evaluation of automatically generated text: Current trends and best practice guidelines

Chris van der Lee,Albert Gatt,Emiel van Miltenburg,Emiel Krahmer

Human evaluation of automatically generated text: Current trends and best practice guidelines

2021

Chris van der Lee
Albert Gatt
Emiel van Miltenburg
Emiel Krahmer

Abstract Currently, there is little agreement as to how Natural Language Generation (NLG) systems should be evaluated, with a particularly high degree of variation in the way that human evaluation is carried out. This paper provides an overview of how (mostly intrinsic) human evaluation is currently conducted and presents a set of best practices, grounded in the literature. These best practices are also linked to the stages that researchers go through when conducting an evaluation research (planning stage; execution and release stage), and the specific steps in these stages. With this paper, we hope to contribute to the quality and consistency of human evaluations in NLG.

Keywords:

research evaluation
Instrumental and intrinsic value
Natural language generation
Computer science
Best practice
Data science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

219

References

Citations