CellHeap: A Workflow for Optimizing COVID-19 Single-Cell RNA-Seq Data Processing in the Santos Dumont Supercomputer

2021 
Currently, several hundreds of Terabytes of COVID-19 single-cell RNA-seq (scRNA-seq) data are available in public repositories. This data refers to multiple tissues, comorbidities, and conditions. We expect this trend to continue, and it is realistic to predict amounts of COVID-19 scRNA-seq data increasing to several Petabytes in the coming years. However, thoughtful analysis of this data requires large-scale computing infrastructures, and software systems optimized for such platforms to generate biological knowledge. This paper presents CellHeap, a portable and robust workflow for scRNA-seq customizable analyses, with quality control throughout the execution steps and deployable on supercomputers. Furthermore, we present the deployment of CellHeap in the Santos Dumont supercomputer for analyzing COVID-19 scRNA-seq datasets, and discuss a case study that processed dozens of Terabytes of COVID-19 scRNA-seq raw data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    36
    References
    0
    Citations
    NaN
    KQI
    []