Snehalika Lall

Indian Statistical Institute

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Sumanta Ray

Aliah University

Sanghamitra Bandyopadhyay

Indian Statistical Institute

Abhik Ghosh

Indian Statistical Institute

Amit Konar

Jadavpur University

Sanghamitra Bandyopadhyay

Indian Institute of Toxicology Research

Alexander Schönhuth

Bielefeld University

Debarka Sengupta

Indraprastha Institute of Information Technology Delhi

Debajyoti Sinha

Florida State University

Anirban Mukhopadhyay

University of Kalyani

Sanchita Ghosh

Bhabha Atomic Research Centre

Cooperative Institutions

Jadavpur University

Indian Statistical Institute

Liverpool Hope University

University of Delhi

Suntory (United Kingdom)

London School of Economics and Political Science

Galgotias University

University of Kalyani

Bielefeld University

Centrum Wiskunde & Informatica

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

sc-REnF: An entropy guided robust feature selection for single-cell RNA-seq data

Briefings in Bioinformatics (2021)

Snehalika Lall Abhik Ghosh Sumanta Ray Sanghamitra Bandyopadhyay

Abstract Annotation of cells in single-cell clustering requires a homogeneous grouping of cell populations. Since single-cell data are susceptible to technical noise, the quality of genes selected prior to clustering is of crucial importance in the preliminary steps of downstream analysis. Therefore, interest in robust gene selection has gained considerable attention in recent years. We introduce sc-REnF [robust entropy based feature (gene) selection method], aiming to leverage the advantages of $R{\prime}{e}nyi$ and $Tsallis$ entropies in gene selection for single cell clustering. Experiments demonstrate that with tuned parameter ($q$), $R{\prime}{e}nyi$ and $Tsallis$ entropies select genes that improved the clustering results significantly, over the other competing methods. sc-REnF can capture relevancy and redundancy among the features of noisy data extremely well due to its robust objective function. Moreover, the selected features/genes can able to determine the unknown cells with a high accuracy. Finally, sc-REnF yields good clustering performance in small sample, large feature scRNA-seq data. Availability: The sc-REnF is available at https://github.com/Snehalikalall/sc-REnF

RNA-Seq

10.1093/bib/bbab517

Cite

Citations (18)

A copula based topology preserving graph convolution network for clustering of single-cell RNA seq data

bioRxiv (Cold Spring Harbor Laboratory) (2021)

Snehalika Lall Sumanta Ray Sanghamitra Bandyopadhyay

Abstract Annotation of cells in single-cell clustering requires a homogeneous grouping of cell populations. There are various issues in single cell sequencing that effect homogeneous grouping (clustering) of cells, such as small amount of starting RNA, limited per-cell sequenced reads, cell-to-cell variability due to cell-cycle, cellular morphology, and variable reagent concentrations. Moreover, single cell data is susceptible to technical noise, which affects the quality of genes (or features) selected/extracted prior to clustering. Here we introduce sc-CGconv ( c opula based g raph conv olution network for s ingle cell c lustering), a stepwise robust unsupervised feature extraction and clustering approach that formulates and aggregates cell–cell relationships using copula correlation (Ccor), followed by a graph convolution network based clustering approach. sc-CGconv formulates a cell-cell graph using Ccor that is learned by a graph-based artificial intelligence model, graph convolution network. The learned representation (low dimensional embedding) is utilized for cell clustering. sc-CGconv features the following advantages. a. sc-CGconv works with substantially smaller sample sizes to identify homogeneous clusters. b. sc-CGconv can model the expression co-variability of a large number of genes, thereby outperforming state-of-the-art gene selection/extraction methods for clustering. c. sc-CGconv preserves the cell-to-cell variability within the selected gene set by constructing a cell-cell graph through copula correlation measure. d. sc-CGconv provides a topology-preserving embedding of cells in low dimensional space. The source code and usage information are available at https://github.com/Snehalikalall/CopulaGCN Contact: sumanta.ray@cwi.nl

Graph Embedding

Hierarchical clustering

10.1101/2021.11.15.468695

Cite

Citations (0)

Enhancing Single-Cell RNA-seq Data Completeness with a Graph Learning Framework

IEEE/ACM Transactions on Computational Biology and Bioinformatics (2024)

Snehalika Lall Sumanta Ray Sanghamitra Bandyopadhyay

Single cell RNA sequencing (scRNA-seq) is a powerful tool to capture gene expression snapshots in individual cells. However, a low amount of RNA in the individual cells results in dropout events, which introduce huge zero counts in the single cell expression matrix. We have developed VAImpute, a variational graph autoencoder based imputation technique that learns the inherent distribution of a large network/graph constructed from the scRNA-seq data leveraging copula correlation ( Ccor) among cells/genes. The trained model is utilized to predict the dropouts events by computing the probability of all non-edges (cell-gene) in the network. We devise an algorithm to impute the missing expression values of the detected dropouts. The performance of the proposed model is assessed on both simulated and real scRNA-seq datasets, comparing it to established single-cell imputation methods. VAImpute yields significant improvements to detect dropouts, thereby achieving superior performance in cell clustering, detecting rare cells, and differential expression. All codes and datasets are given in the github link: https://github.com/sumantaray/VAImputeAvailability.

Completeness (order theory)

RNA-Seq

10.1109/tcbb.2024.3492384

Cite

Citations (0)

sc-REnF: An Entropy Guided Robust Feature Selection for Single-Cell RNA-seq Data

Research Square (Research Square) (2021)

Snehalika Lall Abhik Ghosh Sumanta Ray Sanghamitra Bandyopadhyay

Abstract Annotation of cells in single-cell clustering requires a homogeneous grouping of cell populations. Since single cell data is susceptible to technical noise, the quality of genes selected prior to clustering is of crucial importance in the preliminary steps of downstream analysis. Therefore, interest in robust gene selection has gained considerable attention in recent years. We introduce sc-REnF, ( r obust en tropy based f eature (gene) selection method), aiming to leverage the advantages of Rényi and Tsallis> entropies in gene selection for single cell clustering. Experiments demonstrate that with tuned parameter ( q ), Rényi and Tsallis entropies select genes that improved the clustering results significantly, over the other competing methods. sc-REnF can capture relevancy and redundancy among the features of noisy data extremely well due to its robust objective function. Moreover, the selected features/genes can able to clusters the unknown cells with a high accuracy. Finally, sc-REnF yields good clustering performance in small sample, large feature scRNA-seq data.

Leverage (statistics)

10.21203/rs.3.rs-355014/v1

Cite

Citations (3)

An l1-Norm Regularized Copula Based Feature Selection

Snehalika Lall Sanghamitra Bandyopadhyay

In this paper, we develop a novel feature selection method called RCFS (Regularized Copula based Feature Selection) based on regularized copula. We use l1 regularization, as it penalizes the redundant co-efficient of features and makes them zero, resulting in non-redundant effective features set. Scale-invariant property of copula ensures good performance in noisy data, thereby improving the stability of the method. Three different forms of copula viz., Gaussian copula, Empirical copula, and Archimedean copula are used with l1 regularization. Results prove a significant improvement in the accuracy of the prediction model than any non regularized feature selection method. The number of optimal features to achieve a fixed accuracy value is also less than any other non regularized feature selection techniques.

Regularization

10.1145/3386164.3386177

Cite

Citations (2)

Generating realistic cell samples for gene selection in scRNA-seq data: A novel generative framework

bioRxiv (Cold Spring Harbor Laboratory) (2021)

Snehalika Lall Sumanta Ray Sanghamitra Bandyopadhyay

Abstract High dimensional, small sample size (HDSS) scRNA-seq data presents a challenge to the gene selection task in single cell. Conventional gene selection techniques are unstable and less reliable due to the fewer number of available samples which affects cell clustering and annotation. Here, we present an improved version of generative adversarial network (GAN) called LSH-GAN to address this issue by producing new realistic samples and combining this with the original scRNA-seq data. We update the training procedure of the generator of GAN using locality sensitive hashing which speeds up the sample generation, thus maintains the feasibility of applying gene selection procedures in high dimension scRNA-seq data. Experimental results show a significant improvement in the performance of benchmark feature (gene) selection techniques on generated samples of one synthetic and four HDSS scRNA-seq data. Comprehensive simulation study ensures the applicability of the model in the feature (gene) selection domain of HDSS scRNA-seq data. Availability The corresponding software is available at https://github.com/Snehalikalall/LSH-GAN

Benchmark (surveying)

Sample (material)

Feature (linguistics)

10.1101/2021.04.29.441920

Cite

Citations (6)

RgCop-A regularized copula based method for gene selection in single-cell RNA-seq data

PLoS Computational Biology (2021)

Snehalika Lall Sumanta Ray Sanghamitra Bandyopadhyay

Gene selection in unannotated large single cell RNA sequencing (scRNA-seq) data is important and crucial step in the preliminary step of downstream analysis. The existing approaches are primarily based on high variation (highly variable genes) or significant high expression (highly expressed genes) failed to provide stable and predictive feature set due to technical noise present in the data. Here, we propose RgCop , a novel r e g ularized cop ula based method for gene selection from large single cell RNA-seq data. RgCop utilizes copula correlation ( Ccor ), a robust equitable dependence measure that captures multivariate dependency among a set of genes in single cell expression data. We formulate an objective function by adding l 1 regularization term with Ccor to penalizes the redundant co-efficient of features/genes, resulting non-redundant effective features/genes set. Results show a significant improvement in the clustering/classification performance of real life scRNA-seq data over the other state-of-the-art. RgCop performs extremely well in capturing dependence among the features of noisy data due to the scale invariant property of copula, thereby improving the stability of the method. Moreover, the differentially expressed (DE) genes identified from the clusters of scRNA-seq data are found to provide an accurate annotation of cells. Finally, the features/genes obtained from RgCop is able to annotate the unknown cells with high accuracy.

10.1371/journal.pcbi.1009464

Cite

Citations (13)

Structure-Aware Principal Component Analysis for Single-Cell RNA-seq Data

Journal of Computational Biology (2018)

Snehalika Lall Debajyoti Sinha Sanghamitra Bandyopadhyay Debarka Sengupta

With the emergence of droplet-based technologies, it has now become possible to profile transcriptomes of several thousands of cells in a day. Although such a large single-cell cohort may favor the discovery of cellular heterogeneity, it also brings new challenges in the prediction of minority cell types. Identification of any minority cell type holds a special significance in knowledge discovery. In the analysis of single-cell expression data, the use of principal component analysis (PCA) is surprisingly frequent for dimension reduction. The principal directions obtained from PCA are usually dominated by the major cell types in the concerned tissue. Thus, it is very likely that using a traditional PCA may endanger the discovery of minority populations. To this end, we propose locality-sensitive PCA (LSPCA), a scalable variant of PCA equipped with structure-aware data sampling at its core. Structure-aware sampling provides PCA with a neutral spread of the data, thereby reducing the bias in its principal directions arising from the redundant samples in a data set. We benchmarked the performance of the proposed method on ten publicly available single-cell expression data sets including one very large annotated data set. Results have been compared with traditional PCA and PCA with random sampling. Clustering results on the annotated data sets also show that LSPCA can detect the minority populations with a higher accuracy.

Data set

Identification

Biomarker Discovery

Sparse PCA

10.1089/cmb.2018.0027

Cite

Citations (48)

LSH-GAN: in-silico generation of cells for small sample high dimensional scRNA-seq data

Research Square (Research Square) (2021)

Snehalika Lall Sumanta Ray Sanghamitra Bandyopadhyay

Abstract A fundamental problem of downstream analysis of scRNA-seq data is the unavailability of enough cell samples compare to the feature size. This is mostly due to the budgetary constraint of single cell experiments or simply because of the small number of available patient samples. Here, we present an improved version of generative adversarial network (GAN) called LSH-GAN to address this issue by producing new realistic cell samples. We update the training procedure of the generator of GAN using locality sensitive hashing which speeds up the sample generation, thus maintains the feasibility of applying the standard procedures of downstream analysis. LSH-GAN outperforms the benchmarks for realistic generation of quality cell samples. Experimental results show that generated samples of LSH-GAN improves the performance of the downstream analysis such as feature (gene) selection and cell clustering.

Unavailability

Sample (material)

Feature (linguistics)

Locality-sensitive hashing

10.21203/rs.3.rs-736403/v1

Cite

Citations (1)

Stable feature selection using copula based mutual information

Pattern Recognition (2020)

Snehalika Lall Debajyoti Sinha Abhik Ghosh Debarka Sengupta Sanghamitra Bandyopadhyay

Leverage (statistics)

10.1016/j.patcog.2020.107697

Cite

Citations (42)