OntoSem: an Ontology Semantic Representation Methodology for Biomedical Domain

2020 
Ontologies are essential description tools for biomedical concepts and entities, supporting biomedical fundamental research such as semantic similarity analysis, protein-protein interaction prediction and so on. An increasing amount of ontology-like domain knowledge is published in scientific publications, meanwhile, advanced natural language processing (NLP) techniques have been widespread to extract information from text resources automatically, both of which facilitate the exploration of the semantic representation of biomedical ontologies. We propose a novel distributional semantic representation methodology based on the combination of two pre-trained and domain-specific word embedding tools, the non-contextualized Word2Vec and the context-dependent NCBI-blueBERT, to enhance the encoding ability for biomedical ontologies. Furthermore, we utilize a randomly initialized bidirectional LSTM to project the obtained word vector sequence to a fixed-length sentence vector, facilitating a flexible and uniform way for the computation of downstream tasks. We evaluate our method in two categories of tasks: the similarity access of ontology terms, and the ontology annotationbased protein-protein interaction classification. Experimental results demonstrate that our method provides encouraging results compared to the baselines in all tests. Our approach offers promising opportunities for representing ontologies semantics and in turn characterizing entities including proteins in biomedical research.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    0
    Citations
    NaN
    KQI
    []