A Web Page Segmentation Algorithm for Extracting Product Information

2006 
Nowadays, as the rapid development of Internet, Web is becoming the most popular and also the largest resource for people to acquire information. At the same time, search engine plays an important role while retrieving inform.ation. Nevertheless, the smallest processing unit of search engine is the whole web pages, which contains plenty of noisy information. If the information can be extracted and used as the smallest processing unit, then it can place a positive effect on search engine's precision; so was born the page segmentation algorithm. However, traditional algorithms cannot extract blocks in product level. Hence, a novel algorithm, basing on product features and DOM (document object mode), is proposed. Compared with those traditional algorithms, not only information consistence is greatly enhanced, but also complexity is decreased with this novel page segmentation algorithm.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    7
    Citations
    NaN
    KQI
    []