Web page sectioning using regex-based template
2008
This work aims to provide a novel, site-specific web page segmentation and section importance detection algorithm, which leverages structural, content, and visual information. The structural and content information is leveraged via template, a generalized regular expression learnt over set of pages. The template along with visual information results into high sectioning accuracy. The experimental results demonstrate the effectiveness of the approach.
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
1
References
2
Citations
NaN
KQI