Web data extraction method based on visual customization of extraction template

2011 
The invention discloses a Web data extraction method based on visual customization of an extraction template. The Web data extraction method comprises the following steps: A. pretreatment of template pages: converting and showing source codes of the template pages; B. visual customization of the extraction template: providing a drag selection function on a user interface, setting the corresponding relationship between attribute tags and data values on the template pages and attributes in a domain model by a user, and establishing the extraction template; C. setting of mass extraction frequency of the pages: extracting the crawled HTML (Hypertext Markup Language) pages in large quantity once every 8 hours; and D. mass extraction of the pages: extracting the crawled HTML pages in large quantity by the corresponding extraction template, converting semi-structured data into structured data and then storing the structured data in a local database.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []