An Approach of Extracting Web Information Based on HTMLParser

Shan Lin,Yanzhong Hu

An Approach of Extracting Web Information Based on HTMLParser

2010

Shan Lin
Yanzhong Hu

Now many applications need to analyze various detail contents of web pages. How to extract web information quickly and effectively becomes very important. Web information is primarily expressed by HTML. HTMLParser is an open project of SourceForge.net and can parse HTML in either a linear or a nested fashion. This paper analyzes the principle of extracting web information based on HTMLParser. In addition, it gives an approach of implementing web information extraction with the classes and methods provided by HTMLParser. At last, we demonstrate the detailed process of web information extraction by an example.

Keywords:

Web development
Semantic Web Stack
Web page
Web modeling
Web standards
Information retrieval
Computer science
Web mapping
Data mining
Static web page
Web design
Web navigation
Data Web
World Wide Web
Web intelligence

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations