Effect of feature selection method on the performance of focused crawlers—A case study on traditional and accelerated focused crawlers

2010 
This paper mainly focuses on the effect of feature selection method on the performance of Traditional Focused Crawler (TFC) and Accelerated Focused Crawler (AFC). Information retrieval methods like querying a search engine, usage of web catalog and browsing may not satisfy the information needs of all the users. When information requirement is about a specific topic, focused crawlers will complement these methods. The aim of these crawlers is to download web pages that are highly relevant to the pre-defined topic. Naive Bayesian classifier is used to guide the crawlers by rating the web page before it is downloaded. For this analysis topics to be crawled are represented using a set of relevant documents. The features used by Bayesian Classifier in construction of the model are collected from the document corpus using Document Frequency and Information Gain feature selection methods. Performance of both the crawlers is evaluated when 500 features are selected using Document Frequency and Information Gain feature selection methods. Accelerated Focused Crawler's performance is evaluated for varied number of features gathered using both the feature selection methods. Target pages recall and Target description recall are used in evaluating the crawlers.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    7
    References
    1
    Citations
    NaN
    KQI
    []