Simulated website login to improve network data crawling efficiency

2019 
With the increasing demand for content from news websites, Web crawler technology has been widely used in the automatic crawling of news website data. However, data crawling behavior is often found and blocked by websites, resulting in inefficient crawling. This paper analyzes the structure of news websites, designs and implements a highly efficient simulated login model for crawling, which avoid the human authorized step during the log-in process. The code testing shows that this simulated login method effectively improves the efficiency and data integrity of the crawler.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    0
    Citations
    NaN
    KQI
    []