Focused Crawling for Retrieving Chemical Information

Zhaojie Xia,Li Guo,Chunyang Liang,Xiaoxia Li,Zhangyuan Yang

Focused Crawling for Retrieving Chemical Information

2007

Zhaojie Xia
Li Guo
Chunyang Liang
Xiaoxia Li
Zhangyuan Yang

The exponential growth of resources available in the Web has made it important to develop instruments to perform search efficiently. This paper proposes an approach for chemical information discovery by using focused crawling. The comparison of combination using various feature representations and classifier algorithms to implement focused crawlers was carried out. Latent Semantic Indexing (LSI) and Mutual Information (MI) were used to extract features from documents, while Naive Bayes (NB) and Support Vector Machines (SVM) were the selected algorithms to compute content relevance score. It was found that the combination of LSI and SVM provided the best solution.

Keywords:

Information discovery
Artificial intelligence
Machine learning
Naive Bayes classifier
Pattern recognition
Latent semantic indexing
Support vector machine
Mutual information
Computer science
Crawling
Classifier (linguistics)
Data mining

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations