Reproducible Scientific Workflows for High Performance and Cloud Computing

2019 
Many complex data analysis tasks are performed by scientific workflows and pipelines deployed on high performance computing (HPC) or cloud computing resources. The complex software stack required by a workflow and unnoticed dependencies can make the deployment of a pipeline a demanding task. Once deployed, workflows tend to be black boxes, especially for users that did not create the pipeline themselves. At the end of a project a researcher should archive the pipeline in order to ensure reproducibility of published results. This paper illustrates a possible solution for each of the three tasks: reproducible deployment via software containers, automated generation of provenance information to break black boxes, and using the CiTAR service for archiving software containers.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    11
    References
    3
    Citations
    NaN
    KQI
    []