Main Core Retention on Graph-of-Words for Single-Document Keyword Extraction

2015 
In this paper, we apply the concept of k-core on the graph-of-words representation of text for single-document keyword extraction, retaining only the nodes from the main core as representative terms. This approach takes better into account proximity between keywords and variability in the number of extracted keywords through the selection of more cohesive subsets of nodes than with existing graph-based approaches solely based on centrality. Experiments on two standard datasets show statistically significant improvements in F1-score and AUC of precision/recall curve compared to baseline results, in particular when weighting the edges of the graph with the number of co-occurrences. To the best of our knowledge, this is the first application of graph degeneracy to natural language processing and information retrieval.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    29
    References
    52
    Citations
    NaN
    KQI
    []