An Integrated Crawling Strategy for Domain-Specific Resource Discovery

Fuyong Yuan,Chunxia Yin,Jian Liu,Yulian Zhang

An Integrated Crawling Strategy for Domain-Specific Resource Discovery

2007

Topic-specific crawler aims to selectively seek out pages that are relevant to a pre-defined set of topics, rather than to exploit all regions of the Web. It is important for domain-specific resource discovery. Topic-specific crawlers yield good recall as well as good precision by restricting themselves to a specific domain from web pages. In this paper, we present an integrated topic-specific crawling strategy. The main features of the crawling process consist of a topic specification module that mediates between users and search engines to identify starting URLs by computing the hub score using BHIST algorithm, and a URL ordering algorithm that combines features of several previous approaches. Experimental results indicate that the new crawling method has better performance, and it was able to fetch higher topic relevant information.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations