Digitaldlsorter: Deep-Learning on scRNA-Seq to Deconvolute Gene Expression Data

2019 
The development of single cell transcriptome sequencing has allowed researchers the possibility to dig inside the role of the individual cell types in a plethora of disease scenarios. It also expands to the whole transcriptome what was only possible for a few tenths of antibodies in cell population analysis. Importantly, it allows resolving the permanent question of whether the changes observed in a particular bulk experiment are a consequence of changes in cell type proportions or an aberrant behavior of a particular cell type. However, single cell experiments are still complex to perform and expensive to sequence making bulk RNA-Seq experiments more common. scRNA-Seq data is proving highly relevant for the characterization of the immune cell repertoire in different diseases ranging from cancer to atherosclerosis. In particular, as scRNA-Seq becomes more widely used, new types of immune cell populations emerge and their role in the genesis and evolution of the disease opens new avenues for personalized immune therapies. Immunotherapy have proven successful in a variety of tumors such breast, colon and melanoma. From a statistical perspective, single-cell data are particularly interesting due to its high dimensionality, overcoming the limitations of the “skinny matrix” that traditional bulk RNA-Seq experiments yield. With the technological advances that enable sequencing hundreds of thousands of cells, scRNA-Seq data have become especially suitable for the application of Machine Learning algorithms such as Deep Learning (DL). We present here a DL based method to enumerate and quantify the immune infiltration in colorectal and breast cancer bulk RNA-Seq samples starting from scRNA-Seq. Our method makes use of a Deep Neural Network (DNN) model that allows quantification not only of lymphocytes as a general population but also of specific CD8+, CD4Tmem, CD4Th and CD4Tregs subpopulations, as well as B-cells and Stromal content. Moreover, the signatures are built from scRNA-Seq data from the tumor, preserving the specific characteristics of the tumor microenvironment as opposite to other approaches in which cells were isolated from blood. Our method was applied to synthetic bulk RNA-Seq and to samples from the TCGA project yielding very accurate results in terms of quantification and survival prediction.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    48
    References
    11
    Citations
    NaN
    KQI
    []