Keyword Search with Real-time Entity Resolution in Relational Databases

2018 
Traditional methods of IR-style keyword search/query in relational databases are based on clean data without entity resolution (ER), and as a result, their answers to a query may contain duplicates for dirty datasets with duplicate tuples that have different identifiers and refer to the same real-world entity. In this paper, we propose a method for processing top-N keyword queries with real-time ER. This method creates an index to obtain candidate tuples for a keyword query, defines a function to compute the similarities between the query and its candidate tuples, and designs a clustering algorithm with the Divide and Conquer mechanism to deduplicate the query results. Extensive experiments are conducted to confirm the effectiveness and efficiency of the method for both dirty and (almost) clean datasets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    1
    Citations
    NaN
    KQI
    []