An Approach for Improving DBpedia as a Research Data Hub.

2020 
Extracted from Wikipedia content, DBpedia is considered one of the most important knowledge bases of the Semantic Web, which has editions in several languages, among which those in English (DBpedia EN) and Portuguese (DBpedia PT). All DBpedia editions are subject to quality issues, more especially DBpedia PT suffers from inconsistencies and lack of data in several domains. This paper describes a semi-automatic and incremental process for publishing data on DBpedia, coming from reliable external sources, while seeking to improve aspects of its quality. In an open science context, the proposal aims at consolidating DBpedia as a reference hub for research data, so that research from any area supported by the Semantic Web data can use its data reliably. Although the approach is independent from a specific DBpedia edition, the supporting prototype tool, named ETL4DBpedia, was built for DBpedia PT, based on ETL workflows (Extract, Transform, Load). This paper also describes the assessment of the approach, applying the tool in a real-usage scenario involving data from the field of botany. This application resulted in an increase by 127% in the completeness of species of medicinal plants in DBpedia PT, besides showing satisfactory performance for ETL4Bpedia components.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    8
    References
    1
    Citations
    NaN
    KQI
    []