Contextual-LDA: A Context Coherent Latent Topic Model for Mining Large Corpora

2016 
Statistical topic models represented by Latent Dirichlet Allocation (LDA) and its variants are ubiquitously applied to understanding large corpora. Meanwhile, topic models based on bag-of-words (Bow) rarely adopt contextual information, which encompasses enormous amount of serviceable knowledge in a document, into the probabilistic framework. This shortcoming of LDA leads to its failing to learn contextual information in sentences and paragraphs. We present a contextual coherent topic model for text learning namely Contextual Latent Dirichlet Allocation (Contextual-LDA) to include the contextual knowledge without increasing the perplexity of the algorithm very much. In our model, a document is segmented into finelydivided word sequences, each corresponded with one distinct latent topic to capture local context, while the global context is obtained by the location a segment appears in the document. We learn parameters using Gibbs sampling analogous to traditional LDA. Our model takes advantage of statistical strength of BoW through extending LDA without ignoring knowledge contained in the original context of documents. We also demonstrate it in supervised scenario. While comparing to LDA model, experiment results on BBC corpus in both unsupervised and supervised settings reveal our method is finely adapted for text mining.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    3
    Citations
    NaN
    KQI
    []