PubMed-Scale Chemical Concept Embeddings Reconstruct Physical Protein Interaction Networks

2021 
PubMed is the largest resource of curated biomedical knowledge to date, entailing more than 30 million documents. Large quantities of novel literature prohibit a single expert to keep track of all potentially relevant papers, resulting in knowledge gaps. In this paper we present ChemMeSHNet, a newly developed PubMed-based network comprised of more than one million associations, constructed from expert-curated MeSH annotations of chemicals based on all currently available PubMed papers. By learning latent representations of concepts in the obtained network, we demonstrate that purely literature-based representations are sufficient for the reconstruction of a large part of the currently known network of physical, empirically determined protein-protein interactions. We demonstrate that simple linear embeddings of node pairs, when coupled with a neural network-based classifier, reliably reconstruct the existing collection of empirically confirmed protein-protein interactions. Further, we demonstrate how pairs of learned representations can be used to prioritize potentially interesting novel interactions based on the common chemical context. Highly ranked interactions are qualitatively inspected in terms of potential complex formation at the structural level, and represent potentially interesting new knowledge. We demonstrate that two protein-protein interactions, prioritized by structure-based approaches, also emerge as probable with regard to the trained machine learning model.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    0
    Citations
    NaN
    KQI
    []