Web Information Extraction Algorithm Based on Ontology and DOM Tree

Li Liu,Junfang Shi,Xinrui Liu

Web Information Extraction Algorithm Based on Ontology and DOM Tree

2010

Li Liu
Junfang Shi
Xinrui Liu

Due to the information on the Web being tremendous, dynamic and irregular, it is difficult to search and integrate information from the Web. This paper proposes a Web information extraction algorithm based on Ontology and DOM tree. The areas are accurately found out and the interested information is extracted exactly by information extraction rules generated by ontology. Furthermore this algorithm implements information extraction through traveling DOM tree. Finally, we implement information extraction system and test its performance on news site. Testing result shows that this algorithm doesn't rely on the page structure and it can increase the recall and precision of information extraction.

Keywords:

Web page
Computer science
XML
Information retrieval
Information extraction
Data mining
Document Object Model
Precision and recall
Ontology (information science)
Algorithm
Tree (data structure)
The Internet
Ontology

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations