Reproducible data management and analysis using R

2019 
Background: Standardizing and documenting computational analyses are necessary to ensure reproducible results. It is especially important for large and complex projects where data collection, analysis, and interpretation may span decades. Our objective is therefore to provide methods, tools, and best practice guidelines adapted for analyses in epidemiological studies that use -omics data. Results: We describe an R-based implementation of data management and preprocessing. The method is well-integrated with the analysis tools typically used for statistical analysis of -omics data. We document all datasets thoroughly and use version control to track changes to both datasets and code over time. We provide a web application to perform the standardized preprocessing steps for gene expression datasets. We provide best practices for reporting data analysis results and sharing analyses. Conclusion: We have used these tools to organize data storage and documentation, and to standardize the analysis of gene expression data, in the Norwegian Women and Cancer (NOWAC) system epidemiology study. We believe our approach and lessons learned are applicable to analyses in other large and complex epidemiology projects.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    0
    Citations
    NaN
    KQI
    []