A Crawler–Parser-Based Approach to Newspaper Scraping and Reverse Searching of Desired Articles

Ankit Aich,Amit Dutta,Aruna Chakraborty

A Crawler–Parser-Based Approach to Newspaper Scraping and Reverse Searching of Desired Articles

2018

Ankit Aich
Amit Dutta
Aruna Chakraborty

How often does it happen, that we cannot get enough information from a newspaper. Often an article mentions a name we have not heard before or simply does not shed enough light on the news and its details. Online newspapers even have a problem of webpage noise. Every article is filled with HTML, Meta tags, JavaScript, and whatnot. This paper provides a fast and efficient approach to scraping a newspaper to get any desired article without the noise and reverse search the same topic on Google to get a list of the most relevant information regarding that article. The algorithm supports ten languages and works with the best newspapers like CNN and BBC.

Keywords:

Web page
Parsing
World Wide Web
Meta element
Newspaper
JavaScript
Web crawler
Computer science
relevant information
Information retrieval
Crawling

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations