Interactive provenance summaries for reproducible science

2016 
Recorded provenance facilitates reproducible science. Provenance metadata can help determine how data were possibly transformed, processed, and derived from original sources. While provenance is crucial for verification and validation, there remains the issue of the granularity — detail at which provenance data must be provided to a user, especially for conducting reproducible science. When data are reproduced successfully the need for detailed provenance is minimal and an essence of the recorded provenance suffices. However, when data are not reproduced correctly users want to quickly drill down into fine-grained provenance to understand causes for failure. In this paper, we describe a drill-up/drill-down method for exploring provenance traces. The drill-up method summarizes the trace by grouping nodes and edges of the trace that have same derivation histories. The method preserves provenance data flow semantics. The drill-down method compares summary groups and ranks groups that may have information about the errors. Both the methods are implemented in an efficient manner using light-weight data structures so as to be suitable for reproducible science. We conduct a thorough experimental analysis to show how the operators perform in compressing and expanding real provenance graphs.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    4
    Citations
    NaN
    KQI
    []