VRPSOFC: a framework for focused crawler using mutation improving particle swarm optimization algorithm.

2019 
The focused crawler is the key technology of the search engine. It filters webpages based on relevant algorithms until certain conditions are met. The current focused crawler is prone to topic-drift and low precision in the process of crawling the webpages. Therefore, this paper proposes a focused crawler framework (VRPSOFC) based on mutation improving particle swarm optimization. First of all, for each topic, VRPSOFC gets 3 different types of seed pages that are easy to generate large-scale web page aggregation based on the page click rate of Google search, which are official website, wikipedia, forum or video page. Then VRPSOFC uses the mutation improved particle swarm optimization algorithm proposed in this paper to crawl webpages, where each seed page will be used as the initial page. Finally, experiment in the real web environment and analyze the results. Compared with traditional VSM and other methods, VRPSOFC can obtain more accurate URL priority and crawl high quality web pages. Therefore, the topic crawler framework proposed in this paper is effective and important.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    2
    Citations
    NaN
    KQI
    []