Deep Web Data Integration with near Duplicate Free

2013 
Deep web data integration has become the center of many research efforts in the recent few years. Near duplicate detection is very important for deep web integration system, there are seldom researches focusing on integrating deep web Integration and near duplicate detection together. In this paper, we develop a integration system, DWI-ndfree to solve this problem. The wrapper of DWI-ndfree consists of four parts: the form filler, the navigator, the extractor and the near duplicate detector. To find near duplicate records, we propose efficient algorithm CheckNearDuplicate. DWI-ndfree can integrate deep web data with near duplicate free and has been used to execute several web extraction and integration tasks efficiently.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    11
    References
    0
    Citations
    NaN
    KQI
    []