Overlapping thematic structures extraction with mixed-membership stochastic blockmodel

2018 
It is increasing important to identify automatically thematic structures from massive scientific literature. The interdisciplinarity enables thematic structures without natural boundaries. In this work, the identification of thematic structures is regarded as an overlapping community detection problem from the large-scale citation-link network. A mixed-membership stochastic blockmodel, armed with stochastic variational inference algorithm, is utilized to detect the overlapping thematic structures. In the meanwhile, in order to enhance readability, each theme is labeled with soft mutual information based method by several topical terms. Extensive experimental results on the astro dataset indicate that mixed-membership stochastic blockmodel primarily uses the local information and allows for the pervasive overlaps, but it favors similar sized themes, which disqualifies this approach from being used to extract the thematic structures from scientific literature. In addition, the thematic structures from the bibliographic coupling network is similar to those from the co-citation network.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    64
    References
    12
    Citations
    NaN
    KQI
    []