Semantically-Enhanced Topic Modeling

2018 
In this paper, we advance the state-of-the-art in topic modeling by means of the design and development of a novel (semi-formal) general topic modeling framework. The novel contributions of our solution include: (i) the introduction of new semantically-enhanced data representations for topic modeling based on pooling, and (ii) the proposal of a novel topic extraction strategy - ASToC - that solves the difficulty in representing topics in our semantically-enhanced information space. In our extensive experimentation evaluation, covering 12 datasets and 12 state-of-the-art baselines, totalizing 108 tests, we exceed (with a few ties) in almost 100 cases, with gains of more than 50% against the best baselines (achieving up to 80% against some runner-ups). We provide qualitative and quantitative statistical analyses of why our solutions work so well. Finally, we show that our method is able to improve document representation in automatic text classification.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    29
    References
    8
    Citations
    NaN
    KQI
    []