Graph-Based Term Weighting Scheme for Topic Modeling

2016 
LSI and LDA are widely used techniques to uncover the underlying topical structure of text. They traditionally rely on bag-of-words representation of documents and term frequency-based (TF) weighting schemes. In this paper, we represent documents as graph-of-words to capture the relationships between close words and propose the number of contexts of co-occurrences as alternative term weights (TW). Experiments with a downstream supervised task show that counting the importance of a node inside the graph results in statistically significant higher accuracy and macro-averaged F1score than with TF-based LSI and LDA.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    29
    References
    3
    Citations
    NaN
    KQI
    []