Human evaluation of automatically generated text: Current trends and best practice guidelines

2021 
Abstract Currently, there is little agreement as to how Natural Language Generation (NLG) systems should be evaluated, with a particularly high degree of variation in the way that human evaluation is carried out. This paper provides an overview of how (mostly intrinsic) human evaluation is currently conducted and presents a set of best practices, grounded in the literature. These best practices are also linked to the stages that researchers go through when conducting an evaluation research (planning stage; execution and release stage), and the specific steps in these stages. With this paper, we hope to contribute to the quality and consistency of human evaluations in NLG.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    219
    References
    16
    Citations
    NaN
    KQI
    []