Distance-Based Adaptive Record Matching for Web Databases

2012 
One of the important steps of Deep Web information integration is identifying duplicate records over multiple Web databases.Due to the features such as query-dependency,the lack of training samples,and the online processing requirements,most state-of-the-art record matching methods are not applicable for the Web database scenario.Based on the analysis of the existing methods,an adaptive distance-based record matching method is proposed by introducing the idea of dynamic attributes' weights adjustment.In the iterative process of the calculation for the similarity of records,the weight of each attribute is dynamically recalculated by means of increasing the weights of the attributes with the bigger similarity in the matching records set and increasing the weights of the attributes with the smaller similarity in the non-matching records set.The proposed method does not require training data as well as human efforts and the experimental results show that it works well for the Web database scenario.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    3
    Citations
    NaN
    KQI
    []