Implementation method for directional crawler based on assigned e-commerce website

2014 
The invention discloses an implementation method for a directional crawler based on an assigned e-commerce website, belongs to the field of WEB data collection, and aims at improving the analysis efficiency and the crawling accuracy rate of the crawler, reducing the crawler failure rate caused by change of website content, and increasing the readability and robustness of codes; on the basis of a generalized crawler, the sequence of tasks is managed by utilizing a queue, multi-thread website content analysis is realized by using a thread pool management mechanism, so that the crawling efficiency is improved. Python is used as an implementation language, information of an assigned web page is captured by using a method of combining a CSS (Cascading style sheets) selector and a Regular Expression, the analysis efficiency, the readability and the error-tolerant rate of the crawler are greatly improved, thus the focused crawler specially used for analyzing store commodity information of the assigned e-commerce website is formed,the efficiency and the crawling accuracy rate of the crawler are improved by the method, and the adaptability and the robustness of the crawler are improved. The method provides a stable and convenient data source for e-commerce price analysis.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []