Detection of Entity Mixture in Knowledge Bases Using Hierarchical Clustering

2016 
Entity mixture in a knowledge base refers to the situation that some attributes of an entity are mistaken for another entity’s, and it often occurs among homonymous entities which have the same value of the attribute “Name”. Elimination of entity mixture is critical to ensure data accuracy and validity for knowledge based services. However, current researches on entity disambiguation mainly focuses on determining the identity of entities mentioned in text during information extraction for building a knowledge base, while little work has been done to verify the information in a built knowledge base. In this paper, we propose a generic method to detect mixed homonymous entities in a knowledge base using hierarchical clustering. The principle of our methodology to differentiate entities is detecting the inconsistence of their attributes based on analysis of the appearance distribution of their attribute values in documents of a common corpus. Experiments on a data set of industry applications have been conducted to demonstrate the workflow of performing the clustering and detecting mixed entities in a knowledge base using our methodology.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    1
    Citations
    NaN
    KQI
    []