Explainability methods for differential gene analysis of single cell RNA-seq clustering models

2021 
Single-cell RNA sequencing (scRNA-seq) produces transcriptomic profiling for individual cells. Due to the lack of cell-class annotations, scRNA-seq is routinely analyzed with unsupervised clustering methods. Because these methods are typically limited to producing clustering predictions (that is, assignment of cells to clusters of similar cells), numerous model agnostic differential expression (DE) libraries have been proposed to identify the genes expressed differently in the detected clusters, as needed in the downstream analysis. In parallel, the advancements in neural networks (NN) brought several model-specific explainability methods to identify salient features based on gradients, eliminating the need for external models. We propose a comprehensive study to compare the performance of dedicated DE methods, with that of explainability methods typically used in machine learning, both model agnostic (such as SHAP, permutation importance) and model-specific (such as NN gradient-based methods). The DE analysis is performed on the results of 3 state-of-the-art clustering methods based on NNs. Our results on 36 simulated datasets indicate that all analyzed DE methods have limited agreement between them and with ground-truth genes. The gradients method outperforms the traditional DE methods, which en-courages the development of NN-based clustering methods to provide an out-of-the-box DE capability. Employing DE methods on the input data preprocessed by clustering method outperforms the traditional approach of using the original count data, albeit still performing worse than gradient-based methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    0
    Citations
    NaN
    KQI
    []