Web crawler model of fetching data speedily based on Hadoop distributed system

Linping Su,Fengxiao Wang

Web crawler model of fetching data speedily based on Hadoop distributed system

2016

Linping Su
Fengxiao Wang

This article puts forward a web crawler model of fetching data speedily based on Hadoop distributed system in view of a large of data, a lack of filtering and sorting, and other reasons which cannot make the web crawler timely and effectively get data index of the Internet in the current network environment. The crawler model will transplant single-threaded or multi-threaded web crawler into a distributed system by way of diversifying and personalizing operations of fetching data and data storage, so that it can improve the scalability and reliability of the crawler and make the program run on multiple operating systems. Through experiments, it has been proved that the web crawler model has a significant effect on the speed of extracting a large number of data in a short time.

Keywords:

Computer science
Database
Web crawler
Data mining
Scalability
Distributed computing
The Internet
Focused crawler
Computer data storage
Data modeling
Filter (signal processing)
Sorting

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations