PROV-man: A PROV-compliant toolkit for provenance management.

2015 
6 Discoveries in modern science can take years and involve the contribution of large amounts of data, many 7 people and various tools. Although good scientific practice dictates that findings should be reproducible, in 8 practice there are very few automated tools that actually support traceability of the scientific method employed, 9 in particular when various experimental environments are involved at different research phases. Data 10 provenance tracking approaches can play a major role in addressing many of these challenges. These 11 approaches propose ways to capture, manage, and use of provenance information to support the traceability of 12 the scientific methods in heterogeneous environments. PROV is a W3C standard that provides a comprensive 13 model for data and semantics representation with common vocabularies and rich concepts to describe 14 provenance. Nevertheless, it is difficult for domain scientists to easily understand and adopt all the richeness 15 provided by PROV. In this paper we describe the design and implementation of the provenance manager 16 PROV-man, a PROV-compliant framework that facilitates the tasks of scientists in integrating provenance 17 capabilities into their data analysis tools. PROV-man provides functionalities to create and manipulate 18 provenance data in a consistent manner and ensures its permanent storage. It also provides a set of interfaces to 19 serialize and export provenance data into various data formats, serving interoperability. The open architecture 20 of PROV-man, consisting of an API and a configurable database, allows for its easy deployment within 21 existing and newly developed software tools. The paper presents examples illustrating the usage of PROV22 man. The first example illustrates how to create and manipulate provenance data of an online newspaper 23 article using PROV-man. The second example demonstrates and evaluates the PROV-man implementation in a 24 more complex case for collection of provenance data about biomedical data analysis activities that are carried 25 out using a distributed computing infrastructure. 26
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    2
    Citations
    NaN
    KQI
    []