Shaoke Lou

Yale University

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Mark Gerstein

Yale University

Jonathan Warrell

Yale University

Donghoon Lee

Allen Institute for Brain Science

Larsson Omberg

Sage Bionetworks

Mark A. Rubin

University of Bern

Fábio C. P. Navarro

Personalis (United States)

Matthew Meyerson

Duke University

Alfonso Valencia

Barcelona Supercomputing Center

Paul Flicek

European Bioinformatics Institute

Rory Johnson

National Cancer Institute

Cooperative Institutions

Harvard University

279

Broad Institute

243

Yale University

222

National Institutes of Health

183

Wellcome Sanger Institute

170

Stanford University

168

Dana-Farber Cancer Institute

149

The University of Texas MD Anderson Cancer Center

143

Massachusetts Institute of Technology

137

University of Cambridge

134

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

Supplementary Figures S1-S3 from Integrative Genomic Analyses Yield Cell-Cycle Regulatory Programs with Prognostic Value

Chao Cheng Shaoke Lou Erik Andrews Matthew Ung Frederick S. Varn

<p>Supplementary Figures S1-S3</p>

Value (mathematics)

10.1158/1541-7786.22514515.v1

Cite

Citations (0)

An integrative ENCODE resource for cancer genomics

bioRxiv (Cold Spring Harbor Laboratory) (2019)

Jing Zhang Donghoon Lee Vineet K. Dhiman Peng Jiang Jie Xu

Abstract ENCODE comprises thousands of functional genomics datasets, and the encyclopedia covers hundreds of cell types, providing a universal annotation for genome interpretation. However, for particular applications, it may be advantageous to use a customized annotation. Here, we develop such a custom annotation by leveraging advanced assays, such as eCLIP, Hi-C, and whole-genome STARR-seq on a number of data-rich ENCODE cell types. A key aspect of this annotation is comprehensive and experimentally derived networks of both transcription factors and RNA-binding proteins (TFs and RBPs). Cancer, a disease of system-wide dysregulation, is an ideal application for such a network-based annotation. Specifically, for cancer-associated cell types, we put regulators into hierarchies and measure their network change (rewiring) during oncogenesis. We also extensively survey TF-RBP crosstalk, highlighting how SUB1, a previously uncharacterized RBP, drives aberrant tumor expression and amplifies the effect of MYC, a well-known oncogenic TF. Furthermore, we show how our annotation allows us to place oncogenic transformations in the context of a broad cell space; here, many normal-to-tumor transitions move towards a stem-like state, while oncogene knockdowns show an opposing trend. Finally, we organize the resource into a coherent workflow to prioritize key elements and variants, in addition to regulators. We showcase the application of this prioritization to somatic burdening, cancer differential expression and GWAS. Targeted validations of the prioritized regulators, elements and variants using siRNA knockdowns, CRISPR-based editing, and luciferase assays demonstrate the value of the ENCODE resource.

ENCODE

10.1101/706424

Cite

Citations (12)

Author Correction: Divergent mutational processes distinguish hypoxic and normoxic tumours

Nature Communications (2022)

Vinayak Bhandari Constance H. Li Robert G. Bristow Paul C. Boutros Lauri A. Aaltonen

Hypoxia

10.1038/s41467-022-32339-4

Cite

Citations (0)

Abstract 4854: A computational framework for prioritizing noncoding regulatory variants in cancer

Cancer Research (2015)

Yao Fu Zhu Liu Shaoke Lou Vincenza Colonna Jason Bedford

Abstract Mutations in key regulatory sequences have been suggested to cause oncogenesis. However, identification of noncoding cancer “drivers” from thousands of somatic alterations is a difficult and unsolved problem. We report a computational framework, FunSeq, to annotate and prioritize these mutations. The framework combines an adjustable data context integrating large-scale genomics (e.g. ENCODE) and cancer resources with a streamlined variant-prioritization pipeline. The pipeline has a weighted scoring system combining: inter- and intra-species (we used patterns of natural polymorphisms to identify human-specific conserved elements) conservation; loss- and gain-of function events for transcription-factor binding; enhancer-gene linkages and network centrality; and per-element recurrence across samples. We further highlight putative drivers with information specific to a particular sample, such as differential gene expression. When applied to an individual tumor genome, our method is able to prioritize the TERT promoter mutation. We then evaluated our framework on a larger-scale first by doing various comparisons with other existing noncoding variant-prioritization tools. Next, we used the recurrence of somatic mutations to validate some of our prioritized mutations. Finally, we developed the recurrence analysis into a database combining all whole-genome sequenced cancer samples and used this to provide higher confidence in mutation prioritization. FunSeq is available from funseq.gersteinlab.org. Note: This abstract was not presented at the meeting. Citation Format: Yao Fu, Zhu Liu, Shaoke Lou, Vincenza Colonna, Jason Bedford, Xinmeng Mu, Kevin Y. Yip, Hyun Min Kang, Tuuli Lappalainen, Andrea Sboner, Haiyuan Yu, 1000 Genomes Project Consortium, Mark Rubin, Chris Tyler-Smith, Ekta Khurana, Mark Gerstein. A computational framework for prioritizing noncoding regulatory variants in cancer. [abstract]. In: Proceedings of the 106th Annual Meeting of the American Association for Cancer Research; 2015 Apr 18-22; Philadelphia, PA. Philadelphia (PA): AACR; Cancer Res 2015;75(15 Suppl):Abstract nr 4854. doi:10.1158/1538-7445.AM2015-4854

Prioritization

ENCODE

10.1158/1538-7445.am2015-4854

Cite

Citations (1)

Additional file 3 of Approaches for integrating heterogeneous RNA-seq data reveal cross-talk between microbes and genes in asthmatic patients

Figshare (2020)

Daniel Spakowicz Shaoke Lou Brian Barron José L. Gómez Tianxiao Li

Additional file 3:Table S2. Gene-microbe links. A tab-delimited text file showing all genes-microbe pairs and the confidence of their link.

RNA-Seq

10.6084/m9.figshare.12544898

Cite

Citations (0)

Bayesian structural time series for biomedical sensor data: A flexible modeling framework for evaluating interventions

PLoS Computational Biology (2021)

Jason Liu Daniel Spakowicz Garrett I. Ash Rebecca Hoyd Rohan Ahluwalia

The development of mobile-health technology has the potential to revolutionize personalized medicine. Biomedical sensors (e.g., wearables) can assist with determining treatment plans for individuals, provide quantitative information to healthcare providers, and give objective measurements of health, leading to the goal of precise phenotypic correlates for genotypes. Even though treatments and interventions are becoming more specific and datasets more abundant, measuring the causal impact of health interventions requires careful considerations of complex covariate structures, as well as knowledge of the temporal and spatial properties of the data. Thus, interpreting biomedical sensor data needs to make use of specialized statistical models. Here, we show how the Bayesian structural time series framework, widely used in economics, can be applied to these data. This framework corrects for covariates to provide accurate assessments of the significance of interventions. Furthermore, it allows for a time-dependent confidence interval of impact, which is useful for considering individualized assessments of intervention efficacy. We provide a customized biomedical adaptor tool, MhealthCI, around a specific implementation of the Bayesian structural time series framework that uniformly processes, prepares, and registers diverse biomedical data. We apply the software implementation of MhealthCI to a structured set of examples in biomedicine to showcase the ability of the framework to evaluate interventions with varying levels of data richness and covariate complexity and also compare the performance to other models. Specifically, we show how the framework is able to evaluate an exercise intervention's effect on stabilizing blood glucose in a diabetes dataset. We also provide a future-anticipating illustration from a behavioral dataset showcasing how the framework integrates complex spatial covariates. Overall, we show the robustness of the Bayesian structural time series framework when applied to biomedical sensor data, highlighting its increasing value for current and future datasets.

10.1371/journal.pcbi.1009303

Cite

Citations (17)

Constructing a full, multiple-layer interactome for SARS-CoV-2 in the context of lung disease: Linking the virus with human genes and microbes

PLoS Computational Biology (2023)

Shaoke Lou Mingjun Yang Tianxiao Li Weihao Zhao Hannah Cevasco

The COVID-19 pandemic caused by the SARS-CoV-2 virus has resulted in millions of deaths worldwide. The disease presents with various manifestations that can vary in severity and long-term outcomes. Previous efforts have contributed to the development of effective strategies for treatment and prevention by uncovering the mechanism of viral infection. We now know all the direct protein–protein interactions that occur during the lifecycle of SARS-CoV-2 infection, but it is critical to move beyond these known interactions to a comprehensive understanding of the “full interactome” of SARS-CoV-2 infection, which incorporates human microRNAs (miRNAs), additional human protein-coding genes, and exogenous microbes. Potentially, this will help in developing new drugs to treat COVID-19, differentiating the nuances of long COVID, and identifying histopathological signatures in SARS-CoV-2-infected organs. To construct the full interactome, we developed a statistical modeling approach called MLCrosstalk (multiple-layer crosstalk) based on latent Dirichlet allocation. MLCrosstalk integrates data from multiple sources, including microbes, human protein-coding genes, miRNAs, and human protein–protein interactions. It constructs "topics" that group SARS-CoV-2 with genes and microbes based on similar patterns of co-occurrence across patient samples. We use these topics to infer linkages between SARS-CoV-2 and protein-coding genes, miRNAs, and microbes. We then refine these initial linkages using network propagation to contextualize them within a larger framework of network and pathway structures. Using MLCrosstalk, we identified genes in the IL1-processing and VEGFA–VEGFR2 pathways that are linked to SARS-CoV-2. We also found that Rothia mucilaginosa and Prevotella melaninogenica are positively and negatively correlated with SARS-CoV-2 abundance, a finding corroborated by analysis of single-cell sequencing data.

Interactome

Crosstalk

Pandemic

10.1371/journal.pcbi.1011222

Cite

Citations (1)

A data-driven single-cell and spatial transcriptomic map of the human prefrontal cortex

Science (2024)

Louise A. Huuki-Myers Abby Spangler Nicholas J. Eagles Kelsey D. Montgomery Sang Ho Kwon

The molecular organization of the human neocortex historically has been studied in the context of its histological layers. However, emerging spatial transcriptomic technologies have enabled unbiased identification of transcriptionally defined spatial domains that move beyond classic cytoarchitecture. We used the Visium spatial gene expression platform to generate a data-driven molecular neuroanatomical atlas across the anterior-posterior axis of the human dorsolateral prefrontal cortex. Integration with paired single-nucleus RNA-sequencing data revealed distinct cell type compositions and cell-cell interactions across spatial domains. Using PsychENCODE and publicly available data, we mapped the enrichment of cell types and genes associated with neuropsychiatric disorders to discrete spatial domains.

Cytoarchitecture

Neocortex

Dorsolateral prefrontal cortex

Human brain

Spatial contextual awareness

Cell type

10.1126/science.adh1938

Cite

Citations (15)

Author Correction: Integrative pathway enrichment analysis of multivariate omics data

Nature Communications (2022)

Marta Paczkowska Jonathan Barenboim Nardnisa Sintupisut Natalie S. Fox Helen Zhu

Omics

10.1038/s41467-022-32342-9

Cite

Citations (0)

GRAM: A GeneRAlized Model to predict the molecular effect of a non-coding variant in a cell-type specific manner

PLoS Genetics (2019)

Shaoke Lou Kellie A. Cotter Tianxiao Li Jin Liang Hussein Mohsen

There has been much effort to prioritize genomic variants with respect to their impact on "function". However, function is often not precisely defined: sometimes it is the disease association of a variant; on other occasions, it reflects a molecular effect on transcription or epigenetics. Here, we coupled multiple genomic predictors to build GRAM, a GeneRAlized Model, to predict a well-defined experimental target: the expression-modulating effect of a non-coding variant on its associated gene, in a transferable, cell-specific manner. Firstly, we performed feature engineering: using LASSO, a regularized linear model, we found transcription factor (TF) binding most predictive, especially for TFs that are hubs in the regulatory network; in contrast, evolutionary conservation, a popular feature in many other variant-impact predictors, has almost no contribution. Moreover, TF binding inferred from in vitro SELEX is as effective as that from in vivo ChIP-Seq. Second, we implemented GRAM integrating only SELEX features and expression profiles; thus, the program combines a universal regulatory score with an easily obtainable modifier reflecting the particular cell type. We benchmarked GRAM on large-scale MPRA datasets, achieving AUROC scores of 0.72 in GM12878 and 0.66 in a multi-cell line dataset. We then evaluated the performance of GRAM on targeted regions using luciferase assays in the MCF7 and K562 cell lines. We noted that changing the insertion position of the construct relative to the reporter gene gave very different results, highlighting the importance of carefully defining the exact prediction target of the model. Finally, we illustrated the utility of GRAM in fine-mapping causal variants and developed a practical software pipeline to carry this out. In particular, we demonstrated in specific examples how the pipeline could pinpoint variants that directly modulate gene expression within a larger linkage-disequilibrium block associated with a phenotype of interest (e.g., for an eQTL).

10.1371/journal.pgen.1007860

Cite

Citations (4)