Reproducibility and reusability limitations in Regulatory Circuits: analysis and solutions

2021 
The Regulatory Circuits project is among the most recent and the most complete attempts to identify cell-type specific regulatory networks in Human. It is one of the largest efforts of public genomics data integration, based on data from the major consortia FANTOM5, ENCODE and Roadmap Epigenomics. This project is a main provider of biological data, cited more than 224 times (Google Scholar) and its resulting networks were used in at least 42 other articles. For such a general resource, reproducibility of both the outputs (regulation networks) and methods (data integration pipeline) is a major issue, since biological data are updated regularly. In addition, users may want to introduce new data into the Regulatory Circuits framework to provide networks about previously uncharacterized cell types or to add information about specific regulators, which require to re-execute the whole pipeline on the new data. In this article, we analyze the various factors limiting reproducibility of the Regulatory Circuits data and methods. Starting from a factual description of our understanding of the methods used in Regulatory Circuits, our contribution is two-fold: we propose (1) a characterization of the different levels of reusability, reproducibility and conceptual issues in the original workflow and (2) a new implementation of the workflow ensuring its consistency with the published description and allowing for an easier reuse and reproduction of the published outputs. Both are applicable beyond the case of Regulatory Circuits.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    0
    Citations
    NaN
    KQI
    []