Main Core Retention on Graph-of-Words for Single-Document Keyword Extraction

François Rousseau,Michalis Vazirgiannis

Main Core Retention on Graph-of-Words for Single-Document Keyword Extraction

2015

François Rousseau
Michalis Vazirgiannis

In this paper, we apply the concept of k-core on the graph-of-words representation of text for single-document keyword extraction, retaining only the nodes from the main core as representative terms. This approach takes better into account proximity between keywords and variability in the number of extracted keywords through the selection of more cohesive subsets of nodes than with existing graph-based approaches solely based on centrality. Experiments on two standard datasets show statistically significant improvements in F1-score and AUC of precision/recall curve compared to baseline results, in particular when weighting the edges of the graph with the number of co-occurrences. To the best of our knowledge, this is the first application of graph degeneracy to natural language processing and information retrieval.

Keywords:

Keyword extraction
Centrality
Computer science
Information retrieval
Recall
Weighting
Degeneracy (mathematics)
Graph

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations