Creating Training Corpora for NLG Micro-Planners

Claire Gardent,Anastasia Shimorina,Shashi Narayan,Laura Perez-Beltrachini

Creating Training Corpora for NLG Micro-Planners

2017

In this paper, we focus on how to create data-to-text corpora which can support the learning of wide-coverage micro-planners i.e., generation systems that handle lexicalisation, aggregation, surface re-alisation, sentence segmentation and referring expression generation. We start by reviewing common practice in designing training benchmarks for Natural Language Generation. We then present a novel framework for semi-automatically creating linguistically challenging NLG corpora from existing Knowledge Bases. We apply our framework to DBpedia data and compare the resulting dataset with (Wen et al., 2016)'s dataset. We show that while (Wen et al., 2016)'s dataset is more than twice larger than ours, it is less diverse both in terms of input and in terms of text. We thus propose our corpus generation framework as a novel method for creating challenging data sets from which NLG models can be learned which are capable of generating text from KB data.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

185

Citations