Method and device for extracting webpage contents

Tingyong Tang,Yulei Liu,Wei Li,Xi Wang,ko hirosi,Kai Zhang,Baisen He,Ying Huang,Huijiao Yang,Zhengkai Xie,Zhipei Wang,Cheng Feng,Sirui Liu

Method and device for extracting webpage contents

2013

The invention discloses a method and a device for extracting webpage contents, and relates to the technical field of Internet. Therefore whether body contents of a webpage can be extracted or not is intelligently recognized; the body contents of the webpage are correctly extracted and are displayed; and the browsing experience of a user can be improved. According to the method, when a request instruction for opening a first webpage is received, whether body content labels exist in a source code corresponding to the first webpage or not is judged; when the body content labels exist in the source code, the body content of the first webpage in the body content labels is extracted through a reader; when the body content labels do not exist in the source code, a start position and an end position expressing the body contents of the first webpage in the source code are recognized; and the body content labels are respectively added behind the start position and in front of the end position, and in addition, the body contents of the first webpage in the body content labels is extracted. The method and the device provided by the invention are suitable to be adopted during webpage content extraction.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations