Web Information Extraction Algorithm Based on Ontology and DOM Tree

2010 
Due to the information on the Web being tremendous, dynamic and irregular, it is difficult to search and integrate information from the Web. This paper proposes a Web information extraction algorithm based on Ontology and DOM tree. The areas are accurately found out and the interested information is extracted exactly by information extraction rules generated by ontology. Furthermore this algorithm implements information extraction through traveling DOM tree. Finally, we implement information extraction system and test its performance on news site. Testing result shows that this algorithm doesn't rely on the page structure and it can increase the recall and precision of information extraction.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    1
    References
    1
    Citations
    NaN
    KQI
    []