Auditing subtype inconsistencies among gene ontology concepts

2017 
Gene Ontology (GO) provides a controlled vocabulary for describing genes and related gene products. Quality assurance of Gene ontology (GO) is a vital aspect of the terminology management lifecycle. In this paper, we introduce a lexical-based inference approach to detecting subtype (or isa) inconsistencies among GO terms (i.e., biological concepts). We first model the name of each concept as a set of words. Then, we generate hierarchically linked and unlinked pairs of concepts (A, B), where A and B have the same number of words, and contain common words as well as a single different word. Each linked concept-pair infers a linked term-pair, and each unlinked concept-pair infers an unlinked term-pair. A term-pair appearing as both linked and unlinked is considered a potential inconsistency, which may represent a subtype inconsistency between the original linked and unlinked concept-pair. Applying this approach to the 03/28/2017 release of GO, a total of 3,715 potential subtype inconsistencies were obtained. Evaluation of a random sample of potential inconsistencies revealed two types of potential errors: missing subtype relations and incorrect subtype relations in GO, and achieved an accuracy of 56.33% for detecting such errors. This indicates that this lexical-based inference approach using the set-of-words model is a promising way to facilitate quality improvement of GO.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    4
    Citations
    NaN
    KQI
    []