Comprehensive generation, visualization, and reporting of quality control metrics for single-cell RNA sequencing data

2020 
Performing comprehensive quality control is necessary to remove technical or biological artifacts in single-cell RNA sequencing (scRNA-seq) data. Artifacts in the scRNA-seq data, such as doublets or ambient RNA, can also hinder downstream clustering and marker selection and need to be assessed. While several algorithms have been developed to perform various quality control tasks, they are only available in different packages across various programming environments. No standardized workflow has been developed to streamline the generation and reporting of all quality control metrics from these tools. We have built an easy-to-use pipeline, named SCTK-QC, in the singleCellTK package that generates a comprehensive set of quality control metrics from a plethora of packages for quality control. We are able to import data from several preprocessing tools including CellRanger, STARSolo, BUSTools, dropEST, Optimus, and SEQC. Standard quality control metrics for each cell are calculated including the total number of UMIs, total number of genes detected, and the percentage of counts mapping to predefined gene sets such as mitochondrial genes. Doublet detection algorithms employed include scrublet, scds, doubletCells, and doubletFinder. DecontX is used to identify contamination in each individual cell. To make the data accessible in downstream analysis workflows, the results can be exported to common data structures in R and Python or to text files for use in any generic workflow. Overall, this pipeline will streamline and standardize quality control analyses for single cell RNA-seq data across different platforms.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    0
    Citations
    NaN
    KQI
    []