Linked data processing provenance: towards transparent and reusable linked data integration

2017 
The growth of Linked Data has created a promising environment for data exploration and a growing number of tools allow users to interactively integrate data from various sources. Eliciting the reliability of the results of such ad-hoc integration processes, consistently recreating those results, and identifying changes upon re-execution, however, can be difficult. Automated process provenance trail creation can provide major benefits in this context, because (i) it enables users to trace the contribution of individual sources and processing steps to the final outcome and judge whether the result can be trusted; (ii) it ensures repeatability and raises the trustworthiness of results; (iii) it ideally enables reconstruction of Linked Data integration processes from the provenance information embedded in the final result. In this paper, we present a provenance model that facilitates automatic generation of semantic provenance information for generic Linked Data integration processes. We implement the generic model in a collaborative mashup environment and evaluate it by means of an example application. We find that the model provides a solid foundation for verifiability and contributes towards making Linked Data integration processes more open, transparent, and reusable, which is crucial in domains where the origin of data is essential, such as, for instance, statistical analyses, scientific research, and data journalism.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    5
    Citations
    NaN
    KQI
    []