Online Topic Modeling for Short Texts

2020 
Retrieval of knowledge from short texts has attracted a lot of attention these days as topic discovery from them can unearth hidden information. In many applications, such topics are needed to be learned on the fly for streaming short texts. In this work we propose an online topic discovery algorithm (OTDA) for short texts. It overcomes the inability of short texts to capture word co-occurrence information by adopting word-context semantic correlation through the skip-gram view of the corpus, following the approach of semantics-assisted NMF (SeaNMF) model due to Shi et al. This OTDA works with one data point or one chunk of data points at a time instead of keeping the entire data in the memory, and also admits the property of memorylessness. We consider a couple of public data sets and an internal data set to conduct experiments using one-pass and multi-pass iterations of the proposed algorithm. The results show encouraging performance of OTDA in terms of average Frobenius loss, Topic Coherence, Normalized Mutual Information (NMI), and emerging topic detection.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    29
    References
    0
    Citations
    NaN
    KQI
    []