A Crawler–Parser-Based Approach to Newspaper Scraping and Reverse Searching of Desired Articles

2018 
How often does it happen, that we cannot get enough information from a newspaper. Often an article mentions a name we have not heard before or simply does not shed enough light on the news and its details. Online newspapers even have a problem of webpage noise. Every article is filled with HTML, Meta tags, JavaScript, and whatnot. This paper provides a fast and efficient approach to scraping a newspaper to get any desired article without the noise and reverse search the same topic on Google to get a list of the most relevant information regarding that article. The algorithm supports ten languages and works with the best newspapers like CNN and BBC.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    0
    Citations
    NaN
    KQI
    []