Examining LDA2Vec and Tweet Pooling for Topic Modeling on Twitter Data

2021 
The short lengths of tweets present a challenge for topic modeling to extend beyond what is provided explicitly from hashtag information. This is particularly true for LDAbased methods because the amount of information available from pertweet statistical analysis is severely limited. In this paper we present LDA2Vec paired with temporal tweet pooling (LDA2VecTTP) and assess its performance on this problem relative to traditional LDA and to Biterm Topic Model (Biterm), which was developed specifically for topic modeling on short text documents. We paired each of the three topic modeling algorithms with three tweet pooling schemes: no pooling, authorbased pooling, and temporal pooling. We then conducted topic modeling on two Twitter datasets using each of the algorithms and the tweet pooling schemes. Our results on the largest dataset suggest that LDA2VecTTP can produce higher coherence scores and more logically coherent and interpretable topics.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []