SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization

Bogdan Gliwa,Iwona Mochol,Maciej Biesek,Aleksander Wawer

SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization

2019

Bogdan Gliwa
Iwona Mochol
Maciej Biesek
Aleksander Wawer

This paper introduces the SAMSum Corpus, a new dataset with abstractive dialogue summaries. We investigate the challenges it poses for automated summarization by testing several models and comparing their results with those obtained on a corpus of news articles. We show that model-generated summaries of dialogues achieve higher ROUGE scores than the model-generated summaries of news – in contrast with human evaluators’ judgement. This suggests that a challenging task of abstractive dialogue summarization requires dedicated models and non-standard quality measures. To our knowledge, our study is the first attempt to introduce a high-quality chat-dialogues corpus, manually annotated with abstractive summarizations, which can be used by the research community for further studies.

Keywords:

research community
Natural language processing
Artificial intelligence
Automatic summarization
Computer science
Judgement

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations