TSSE-DMM: Topic Modeling for Short Texts Based on Topic Subdivision and Semantic Enhancement

2021 
Short texts have been prevalent in Web sites and the emerging social media for several years, which makes it a critical task to identify intelligible topics from online data sources. However, the existing topic models over short texts cannot analyze the internal components of the learned topics, which is significant for improving the coherence and interpretability of topics. In this paper, we propose a novel topic model for short texts, named TSSE-DMM, for improving the coherence and interpretability of topics by the topic subdivision and alleviating the problem of text sparsity by the semantic enhancement strategy. Firstly, we subdivide each topic into 4 detailed aspects, namely the location aspect, the people & organization aspect, the core word aspect, and the background word aspect, to obtain the different and interpretable components of topics. Then, we combine the Generalized Polya Urn model and the joint word embedding to solve the problem of data sparsity. The extensive experimental results carried on three real-world text collections in two languages show that our model achieves better topic representations than the baseline methods. Moreover, our method has been adopted by the public service hotline platform of Jiangsu province in China.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    0
    Citations
    NaN
    KQI
    []