Dysregulation of alternative splicing has been repeatedly associated with neurodevelopmental disorders, but the extent of cell-type-specific splicing in human neural development remains largely uncharted. Here, single-cell long-read sequencing in induced pluripotent stem cell (iPSC)-derived cerebral organoids identifies over 31,000 uncatalogued isoforms and 4,531 cell-type-specific splicing events. Long reads uncover coordinated splicing and cell-type-specific intron retention events, which are challenging to study with short reads. Retained neuronal introns are enriched in RNA splicing regulators, showing shorter lengths, higher GC contents, and weaker 5′ splice sites. We use this dataset to explore the biological processes underlying neurological disorders, focusing on autism. In comparison with prior transcriptomic data, we find that the splicing program in autistic brains is closer to the progenitor state than differentiated neurons. Furthermore, cell-type-specific exons harbor significantly more de novo mutations in autism probands than in siblings. Overall, these results highlight the importance of cell-type-specific splicing in autism and neuronal gene regulation.
SummaryBackground & aimsPrevious observational studies have yielded inconsistent findings regarding associations between red/processed meat intake and the risk of cardiovascular disease (CVD). Some studies have suggested positive relationships, while others have demonstrated no significant associations. However, causal effects remain uncertain. This 2023 Mendelianrandomization (MR) study investigated the causal relationship between red and processed meat (porkmeat, mutton meat, beef meat)intake and CVD risk by analyzing summary data from the UK Biobank (exposure), CARDIoGRAMplusC4D (coronary artery disease [CAD]), MEGASTROKE (stroke), Nielsen et al. (atrial fibrillation [AF]), HERMES (heart failure [HF]), and FinnGen (cardiovascular outcomes) public databases.MethodsGenome-wide association studies (GWAS) of red meat (pork, beef, and mutton) and processed meat were sourced from the United Kingdom (UK) Biobank. GWAS data on CVD for this study were obtained from the Gene and FinnGen consortia. The primary method employed for the two-sample MR analysis was inverse variance weighting (IVW). Sensitivity analysis was performed to assess the reliability and consistency of the results.ResultsGenetically predicted red and processed meat consumption did not demonstrate a causal association with any CVD outcomes when employing the IVW method. For processed meat intake, the odds ratios (ORs) (95% confidence intervals CIs) in large consortia were as follows: 0.88 (0.56–1.39) for CAD, 0.91 (0.65–1.27) for AF, 0.84 (0.58–1.21) for HF, and 1.00 (0.75–1.05) for stroke. In FinnGen, the ORs were as follows: 1.15 (0.83–1.59) for CAD, 1.25 (0.75–2.07) for AF, 1.09 (0.73–1.64) for HF, and 1.27 (0.85–1.91) for stroke. For beef intake, the ORs (95% CIs) in large consortia were as follows: 0.70 (0.28–1.73) for CAD, 0.85 (0.49–1.49) for AF, 0.80 (0.35–1.83) for HF, and 1.29 (0.85–1.95) for stroke. In FinnGen, the ORs were as follows: 2.01 (0.75–5.39) for CAD, 1.83 (0.60–5.56) for AF, 0.80 (0.30–2.13) for HF, and 1.30 (0.62–2.73) for stroke. For pork intake, the ORs (95% CIs) in large consortia were as follows: 1.25 (0.37–4.22) for CAD, 1.26 (0.73–2.15) for AF, 1.71 (0.86–3.39) for HF, and 1.15 (0.63–2.11) for stroke. In FinnGen, the ORs were as follows: 1.12 (0.43–2.88) for CAD, 0.39 (0.08–1.83) for AF, 0.62 (0.20–1.88) for HF, and 0.60 (0.21–1.65) for stroke. For mutton intake, the ORs (95% CIs) in large consortia were as follows: 0.84 (0.48–1.44) for CAD, 0.84 (0.56–1.26) for AF, 1.04 (0.65–1.67) for HF, and 1.06 (0.77–1.45) for stroke. In FinnGen, the ORs were as follows: 1.20 (0.65–2.21) for CAD, 0.92 (0.44–1.92) for AF, 0.74 (0.34–1.58) for HF, and 0.75 (0.45–1.24) for stroke. The results remained robust and consistent in both the meta-analysis and supplementary MR analysis.ConclusionsThis MR study demonstrated no significant causal relationships between red/processed meat intake and the risk of the four CVD outcomes examined. Further investigation is warranted to confirm these findings.
Abstract Clustered regularly interspaced short palindromic repeats (CRISPR) screening coupled with single-cell RNA sequencing has emerged as a powerful tool to characterize the effects of genetic perturbations on the whole transcriptome at a single-cell level. However, due to its sparsity and complex structure, analysis of single-cell CRISPR screening data is challenging. In particular, standard differential expression analysis methods are often underpowered to detect genes affected by CRISPR perturbations. We developed a statistical method for such data, called guided sparse factor analysis (GSFA). GSFA infers latent factors that represent coregulated genes or gene modules; by borrowing information from these factors, it infers the effects of genetic perturbations on individual genes. We demonstrated through extensive simulation studies that GSFA detects perturbation effects with much higher power than state-of-the-art methods. Using single-cell CRISPR data from human CD8 + T cells and neural progenitor cells, we showed that GSFA identified biologically relevant gene modules and specific genes affected by CRISPR perturbations, many of which were missed by existing methods, providing new insights into the functions of genes involved in T cell activation and neurodevelopment.
Abstract Analysis of de novo mutations (DNMs) from sequencing data of nuclear families has identified risk genes for many complex diseases, including multiple neurodevelopmental and psychiatric disorders. Most of these efforts have focused on mutations in protein-coding sequences. Evidence from genome-wide association studies (GWAS) strongly suggests that variants important to human diseases often lie in non-coding regions. Extending DNM-based approaches to non-coding sequences is, however, challenging because the functional significance of non-coding mutations is difficult to predict. We propose a new statistical framework for analyzing DNMs from whole-genome sequencing (WGS) data. This method, TADA-Annotations (TADA-A), is a major advance of the TADA method we developed earlier for DNM analysis in coding regions. TADA-A is able to incorporate many functional annotations such as conservation and enhancer marks, learn from data which annotations are informative of pathogenic mutations and combine both coding and non-coding mutations at the gene level to detect risk genes. It also supports meta-analysis of multiple DNM studies, while adjusting for study-specific technical effects. We applied TADA-A to WGS data of ∼300 autism family trios across five studies, and discovered several new autism risk genes. The software is freely available for all research uses.
Abstract Trans-eQTLs collectively explain a substantial proportion of expression variation, yet are challenging to detect and replicate since their effects are individually weak. Many trans-effects are mediated by cis-gene expression and some of those effects are shared across tissue types/conditions. To detect robust cis-mediated trans-associations at the gene-level and for specific single nucleotide polymorphisms (SNPs), we proposed two Cross-Condition Mediation methods – CCmed gene and CCmed GWAS , respectively. We analyzed data from 13 brain tissue types from the Genotype-Tissue Expression (GTEx) project, and identified trios with cis-eQTLs of a cis-gene having associations with a trans-gene, many of which show evidence of replication in other datasets. By applying CCmed GWAS , we identified trans-genes associated with known schizophrenia susceptibility loci. We further conducted validation analyses assessing the schizophrenia-risk-associations of the identified trans-genes, by harnessing GWAS summary statistics from the Psychiatric Genomics Consortium and multitissue eQTL statistics from GTEx.
Interaction between transcription factors (TFs) and DNA plays a key role in regulating gene expression. It is generally believed that these interactions are controlled through recognition of DNA core motifs by TFs. Nevertheless, several studies pointed out the limitation of this view, in particular, DNA sequence variants influencing TF binding are often located outside of core motifs. One possible explanation is that the physical properties of DNA may play a role in TF-DNA interactions. Recent studies have supported the importance of DNA shape features, especially in flanking regions of core motifs. Another important physical property of DNA is DNA breathing, the spontaneous opening of double-stranded DNA through thermal motions. But there have been few genomic studies of the role of DNA breathing in TF-DNA interactions. In this work, we analyzed in vitro TF-DNA binding data of three TFs and found that DNA breathing features inside or near core motifs are correlated with binding affinity. This suggests that these TFs may prefer locally and temporally melted DNA formed through breathing. We extended the analysis to 44 TFs with in vivo ChIP-seq binding data. We found that for a large proportion of TFs, their breathing features in or near core motifs are associated with binding, but the sign and magnitude of these associations vary substantially across TF families. Altogether, our study supports the hypothesis that DNA breathing features near binding motifs contribute to TF-DNA interactions. Proper regulation of when and where genes are expressed is crucial to biological development and function. This process is largely controlled by interaction of transcription factors (TFs) with DNA sequences. The recognition of specific DNA sequences by TFs is important to ensure that only the correct genes are activated. Extensive work has shown that TFs prefer to bind certain DNA sequence patterns of 6-20 bp, known as motifs. However, the structure of DNA molecules may also play a role. In this work, we explored the role of DNA breathing, which refers to spontaneous opening of double strand DNA due to thermal motions. This process creates transient, single-strand "bubbles" in DNA. Through examining TF-DNA binding data of >60 TFs, we found that the propensity of DNA forming bubbles near motifs is often associated with binding affinity of DNA sequence. Interestingly the patterns of these associations seem to vary with TFs. Altogether, our results highlighted the potential of DNA breathing in influencing TF-DNA interactions.