A Semi-automated Evaluation Metric for Dialogue Model Coherence

2016 
We propose a new metric, Voted Appropriateness, which can be used to automatically evaluate dialogue policy decisions, once some wizard data has been collected. We show that this metric outperforms a previously proposed metric Weak agreement. We also present a taxonomy for dialogue model evaluation schemas, and orient our new metric within this taxonomy.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    14
    Citations
    NaN
    KQI
    []