Web page sectioning using regex­-based template

2008 
This work aims to provide a novel, site-specific web page segmentation and section importance detection algorithm, which leverages structural, content, and visual information. The structural and content information is leveraged via template, a generalized regular expression learnt over set of pages. The template along with visual information results into high sectioning accuracy. The experimental results demonstrate the effectiveness of the approach.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    1
    References
    2
    Citations
    NaN
    KQI
    []