Beyond MeSH: Fine-Grained Semantic Indexing of Biomedical Literature Based on Weak Supervision

2019 
Biomedical literature in MEDLINE/PubMed is semantically indexed with MeSH thesaurus entries (subject annotations) which may correspond to more than one related but distinct domain concepts. In such cases, the subject annotations do not follow the level of detail available in the domain and do not always suffice to meet the information needs of domain experts. In this work, we propose a method to automatically refine subject annotations at the level of concepts and employ it in the case of the MeSH descriptor for Alzheimer's Disease, which corresponds to six different concepts representing disease sub-types. The results indicate that the use of concept-occurrence as weak supervision can improve upon the predictive performance of literal string matching alone. The refined annotations can support more precise concept-based search, enable the integration of subject annotations with other semantic information and facilitate the maintenance of subject annotation consistency, as the MeSH thesaurus evolves with the addition of more detailed entries.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    2
    Citations
    NaN
    KQI
    []