Relevant Data Node Extraction:A Web Data Extraction Method for Non Contagious Data

2020 
The Internet is expanding rapidly and millions of HTML pages are created daily. These HTML pages are created by content management systems like Wordpress, Joomla or by other software programs. This software programs query data from single or multiple associated databases & then fill the template with data in web pages to get well-structured data and call this well-structured data as data nodes. This paper proposes a novel technique to detect and extract structured data from web pages. These data nodes are very vital since they provide information about all the structured data. A data extraction technique Relevant Data Node Extraction (RDNE) that automatically mine relevant data nodes from HTML pages is presented. The algorithm in this paper is based on some set of rules that are observed & implemented. Our approach showed excellent results for the proposed technique.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    2
    Citations
    NaN
    KQI
    []