Spatial Relation Based Object Extraction from the World Wide Web.

2008 
The statistical results of observations show that regular spatial distribution characteristics exist for Web information about objects of the same type across different Web sites. The spatial distance between components within one object is always less than that between different objects. A novel method based on spatial configuration of Web document to extract object from the World Wide Web is presented. It demonstrates a fully automatic bottom-up process of object extraction. This method primarily considers the distribution characteristic of Web information and is independent of underlying documentation representation, such as HTML code. Experiments show that the proposed method can work well even when the HTML structure is far different from layout structure, and the results are encouraging.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []