Web crawler model of fetching data speedily based on Hadoop distributed system

2016 
This article puts forward a web crawler model of fetching data speedily based on Hadoop distributed system in view of a large of data, a lack of filtering and sorting, and other reasons which cannot make the web crawler timely and effectively get data index of the Internet in the current network environment. The crawler model will transplant single-threaded or multi-threaded web crawler into a distributed system by way of diversifying and personalizing operations of fetching data and data storage, so that it can improve the scalability and reliability of the crawler and make the program run on multiple operating systems. Through experiments, it has been proved that the web crawler model has a significant effect on the speed of extracting a large number of data in a short time.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    11
    References
    5
    Citations
    NaN
    KQI
    []