Topic Discovery for Biomedical Corpus Using MeSH Embeddings

2019 
Discovering latent topics from biomedical documents has become a pivotal task in many biomedical text mining applications. Medical Subject Headings (MeSH) terms, which are curated by human experts, provide highly precise keyword representations for biomedical documents. However, the performance of conventional topic models on MeSH documents is usually unsatisfying due to the limited length of individual MeSH documents. In this paper, we propose a novel topic model for MeSH documents using MeSH embeddings. The proposed topic model is able to overcome the lack of context information problem in MeSH documents by 1) exploiting the rich term-level co-occurrence patterns instead of the sparse document-level co-occurrence patterns, and 2) incorporating additional MeSH semantics in MeSH embeddings learned from a large external biomedical knowledge base. Experimental result on a real-world biomedical dataset shows the efficacy of the proposed model in discovering coherent topics from MeSH documents.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    0
    Citations
    NaN
    KQI
    []