Multi-Document Summarization Using Complex and Rich Features

Maria Lucía del Rosario,Castro Jorge,Verônica Agostini,Thiago Alexandre,Salgueiro Pardo

Multi-Document Summarization Using Complex and Rich Features

2010

Maria Lucía del Rosario
Castro Jorge
Verônica Agostini
Thiago Alexandre
Salgueiro Pardo

Multi-document summarization consists in automatically producing a unique informative summary from a collection of texts on the same topic. In this paper we model the multi-document summarization task as a problem of machine learning classification where sentences from the source texts have to be classified as belonging or not to the summary. For this aim, we combine superficial (e.g., sentence position in the text) and deep linguistic features (e.g. semantic relations across documents). In particular, the linguistic features are given by CST (Cross-document Structure Theory). We conduct our experiments on a CST-annotated corpus of news texts. Results show that linguistic features help to produce a better classification model, producing state-of- the-art results.

Keywords:

Natural language processing
Structure (category theory)
Information retrieval
Automatic summarization
Statistical classification
Sentence
Multi-document summarization
Artificial intelligence
Computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations