Current challenges in web crawling

Denis Shestakov

Current challenges in web crawling

2013

Denis Shestakov

Web crawling, a process of collecting web pages in an automated manner, is the primary and ubiquitous operation used by a large number of web systems and agents starting from a simple program for website backup to a major web search engine. Due to an astronomical amount of data already published on the Web and ongoing exponential growth of web content, any party that want to take advantage of massive-scale web data faces a high barrier to entry. In this tutorial, we will introduce the audience to five topics: architecture and implementation of high-performance web crawler, collaborative web crawling, crawling the deep Web, crawling multimedia content and future directions in web crawling research.

Keywords:

Web page
Web modeling
Web development
World Wide Web
Semantic Web Stack
Distributed web crawling
Data Web
Static web page
Internet privacy
Computer science
Web navigation
Web standards
Web design

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations