On Dynamic Topic Models for Mining Social Media

2019 
Analyzing media in real time is of great importance with social media platforms at the epicenter of crunching, digesting, and disseminating content to individuals connected to these platforms. Within this context, topic models, specially latent Dirichlet allocation (LDA), have gained strong momentum due to their scalability, inference power, and their compact semantics. Although, state-of-the-art topic models come short in handling streaming large chunks of data arriving dynamically onto the platform, thus hindering their quality of interpretation as well as their adaptability to information overload. In this manuscript (Jaradat et al. OLLDA: a supervised and dynamic topic mining framework in twitter. In: 2015 IEEE international conference on data mining workshop (ICDMW), November 2015. IEEE, Piscataway, pp. 1354–1359), we evaluate a labeled and online extension to LDA (OLLDA), which incorporates supervision through external labeling and capability of quickly digesting real-time updates thus making it more adaptive to Twitter and platforms alike. Our proposed extension has capability of handling large quantities of newly arrived documents in a stream, and at the same time, is capable of achieving high topic inference quality given the short and often sloppy text of tweets. Our approach mainly uses an approximate inference technique based on variational inference coupled with a labeled LDA (L-LDA) model. We conclude by presenting experiments using a 1-year crawl of Twitter data that shows significantly improved topical inference as well as temporal user profile classification when compared to state-of-the-art baselines. Given the popularity of words’ prediction techniques such as Word2vec, we present an additional benchmark to measure the performance of classification.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    37
    References
    4
    Citations
    NaN
    KQI
    []