Identification of Deep Web Entries by Using Neural Network

2012 
Deep web is the fastest-growing new resource on the Internet. The establishment of its data integration system has become a research focus. The deep web entries, with its automatic identification as the basis of deep web data integration, usually appears in HTML forms. Owing to the subjectivity in form design, the lack of unified construction standards makes it difficult to judge whether or not a HTML form is a deep web entry by heuristics and manually specified rules. Based on the global schema and the notion of machine learning, this paper proposes an approach to identify deep web entries by using neural network. Through statistic of abundant forms data, this paper provides 14 features to distinguish query interface from non-query interface. Experiments on 12 sets of data show higher accuracy of our proposed approach and its use in the automatic identification of deep web entries is thus recommended.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    0
    Citations
    NaN
    KQI
    []