Alternative splicing impacts most multi-exonic human genes. Inaccuracies during this process may have an important role in ageing and disease. Here, we investigated mis-splicing using RNA-sequencing data from ~14K control samples and 42 human body sites, focusing on split reads partially mapping to known transcripts in annotation. We show that mis-splicing occurs at different rates across introns and tissues and that these splicing inaccuracies are primarily affected by the abundance of core components of the spliceosome assembly and its regulators. Using publicly available data on short-hairpin RNA knockdowns of these spliceosomal components, we found support for the importance of RNA-binding proteins in mis-splicing. We also demonstrated that age is positively correlated with mis-splicing, and it affects genes implicated in neurodegenerative diseases. This in-depth characterisation of mis-splicing can have important implications for our understanding of the role of splicing inaccuracies in human disease and the interpretation of long-read RNA-sequencing data.
Alternative splicing impacts most multi-exonic human genes. Inaccuracies during this process may have an important role in ageing and disease. Here, we investigated mis-splicing using RNA-sequencing data from ~14K control samples and 42 human body sites, focusing on split reads partially mapping to known transcripts in annotation. We show that mis-splicing occurs at different rates across introns and tissues and that these splicing inaccuracies are primarily affected by the abundance of core components of the spliceosome assembly and its regulators. Using publicly available data on short-hairpin RNA knockdowns of these spliceosomal components, we found support for the importance of RNA-binding proteins in mis-splicing. We also demonstrated that age is positively correlated with mis-splicing, and it affects genes implicated in neurodegenerative diseases. This in-depth characterisation of mis-splicing can have important implications for our understanding of the role of splicing inaccuracies in human disease and the interpretation of long-read RNA-sequencing data.
Abstract Improvements in functional genomic annotation have led to a critical mass of neurogenetic discoveries. This is exemplified in hereditary ataxia, a heterogeneous group of disorders characterised by incoordination from cerebellar dysfunction. Associated pathogenic variants in more than 300 genes have been described, leading to a detailed genetic classification partitioned by age-of-onset. Despite these advances, up to 75% of patients with ataxia remain molecularly undiagnosed even following whole genome sequencing, as exemplified in the 100 000 Genomes Project. This study aimed to understand whether we can improve our knowledge of the genetic architecture of hereditary ataxia by leveraging functional genomic annotations, and as a result, generate insights and strategies that raise the diagnostic yield. To achieve these aims, we used publicly-available multi-omics data to generate 294 genic features, capturing information relating to a gene’s structure, genetic variation, tissue-specific, cell-type-specific and temporal expression, as well as protein products of a gene. We studied these features across genes typically causing childhood-onset, adult-onset or both types of disease first individually, then collectively. This led to the generation of testable hypotheses which we investigated using whole genome sequencing data from up to 2182 individuals presenting with ataxia and 6658 non-neurological probands recruited in the 100 000 Genomes Project. Using this approach, we demonstrated a high short tandem repeat (STR) density within childhood-onset genes suggesting that we may be missing pathogenic repeat expansions within this cohort. This was verified in both childhood- and adult-onset ataxia patients from the 100 000 Genomes Project who were unexpectedly found to have a trend for higher repeat sizes even at naturally-occurring STRs within known ataxia genes, implying a role for STRs in pathogenesis. Using unsupervised analysis, we found significant similarities in genomic annotation across the gene panels, which suggested adult- and childhood-onset patients should be screened using a common diagnostic gene set. We tested this within the 100 000 Genomes Project by assessing the burden of pathogenic variants among childhood-onset genes in adult-onset patients and vice versa. This demonstrated a significantly higher burden of rare, potentially pathogenic variants in conventional childhood-onset genes among individuals with adult-onset ataxia. Our analysis has implications for the current clinical practice in genetic testing for hereditary ataxia. We suggest that the diagnostic rate for hereditary ataxia could be increased by removing the age-of-onset partition, and through a modified screening for repeat expansions in naturally-occurring STRs within known ataxia-associated genes, in effect treating these regions as candidate pathogenic loci.
Abstract Genetic correlation ( r g ) between traits can offer valuable insight into underlying shared biological mechanisms. Neurodegenerative diseases overlap neuropathologically and often manifest comorbid neuropsychiatric symptoms. However, global r g analyses show minimal r g among neurodegenerative and neuropsychiatric diseases. Importantly, local r g s can exist in the absence of global relationships. To investigate this possibility, we applied LAVA, a tool for local r g analysis, to genome-wide association studies of 3 neurodegenerative diseases (Alzheimer’s disease, Lewy body dementia and Parkinson’s disease) and 3 neuropsychiatric disorders (bipolar disorder, major depressive disorder and schizophrenia). We identified several local r g s missed in global analyses, including between (i) all 3 neurodegenerative diseases and schizophrenia and (ii) Alzheimer’s and Parkinson’s disease. For those local r g s identified in genomic regions containing disease-implicated genes, such as SNCA, CLU and APOE , incorporation of expression quantitative trait loci identified genes that may drive genetic overlaps between diseases. Collectively, we demonstrate that complex genetic relationships exist among neurodegenerative and neuropsychiatric diseases, highlighting putative pleiotropic genomic regions and genes. These findings imply sharing of pathogenic processes and the potential existence of common therapeutic targets.
Visual hallucinations are common in Parkinson's disease and are associated with poorer prognosis. Imaging studies show white matter loss and functional connectivity changes with Parkinson's visual hallucinations, but the biological factors underlying selective vulnerability of affected parts of the brain network are unknown. Recent models for Parkinson's disease hallucinations suggest they arise due to a shift in the relative effects of different networks. Understanding how structural connectivity affects the interplay between networks will provide important mechanistic insights. To address this, we investigated the structural connectivity changes that accompany visual hallucinations in Parkinson's disease and the organizational and gene expression characteristics of the preferentially affected areas of the network. We performed diffusion-weighted imaging in 100 patients with Parkinson's disease (81 without hallucinations, 19 with visual hallucinations) and 34 healthy age-matched controls. We used network-based statistics to identify changes in structural connectivity in Parkinson's disease patients with hallucinations and performed an analysis of controllability, an emerging technique that allows quantification of the influence a brain region has across the rest of the network. Using these techniques, we identified a subnetwork of reduced connectivity in Parkinson's disease hallucinations. We then used the Allen Institute for Brain Sciences human transcriptome atlas to identify regional gene expression patterns associated with affected areas of the network. Within this network, Parkinson's disease patients with hallucinations showed reduced controllability (less influence over other brain regions), than Parkinson's disease patients without hallucinations and controls. This subnetwork appears to be critical for overall brain integration, as even in controls, nodes with high controllability were more likely to be within the subnetwork. Gene expression analysis of gene modules related to the affected subnetwork revealed that down-weighted genes were most significantly enriched in genes related to mRNA and chromosome metabolic processes (with enrichment in oligodendrocytes) and upweighted genes to protein localization (with enrichment in neuronal cells). Our findings provide insights into how hallucinations are generated, with breakdown of a key structural subnetwork that exerts control across distributed brain regions. Expression of genes related to mRNA metabolism and membrane localization may be implicated, providing potential therapeutic targets.
Abstract Genome-wide association studies have generated an increasing number of common genetic variants that affect neurological and psychiatric disease risk. Given that many causal variants are likely to operate by regulating gene expression, an improved understanding of the genetic control of gene expression in human brain is vital. However, the difficulties of sampling human brain, and its complexity, has meant that brain-related expression quantitative trait loci (eQTL) and allele specific expression (ASE) signals have been more limited in their explanatory power than might otherwise be expected. To address this, we use paired genomic and transcriptomic data from putamen and substantia nigra dissected from 117 brains, combined with a comprehensive set of analyses, to interrogate regulation at different stages of RNA processing and uncover novel transcripts. We identify disease-relevant regulatory loci and reveal the types of analyses and regulatory positions yielding the most disease-specific information. We find that splicing eQTLs are enriched for neuron-specific regulatory information; that ASE analyses provide highly cell-specific regulatory information; and that incomplete annotation of the brain transcriptome limits the interpretation of risk loci for neuropsychiatric disease. We release this rich resource of regulatory data through a searchable webserver, http://braineacv2.inf.um.es/ .
Abstract Genome-wide association studies of late-onset Alzheimer’s disease (AD) have highlighted the importance of variants associated with genes expressed by the innate immune system in determining risk for AD. Recently, we and others have shown that genes associated with variants that confer risk for AD are significantly enriched in transcriptional networks expressed by amyloid-responsive microglia. This allowed us to predict new risk genes for AD, including the interferon-responsive oligoadenylate synthetase 1 ( OAS1 ). However, the function of OAS1 within microglia and its genetic pathway are not known. Using genotyping from 1,313 individuals with sporadic AD and 1,234 control individuals, we confirm that the OAS1 variant, rs1131454, is associated with increased risk for AD and decreased OAS1 expression. Moreover, we note that the same locus was recently associated with critical illness in response to COVID-19, linking variants that are associated with AD and a severe response to COVID-19. By analysing single-cell RNA-sequencing (scRNA-seq) data of isolated microglia from APP NL-G-F knock-in and wild-type C57BL/6J mice, we identify a transcriptional network that is significantly upregulated with age and amyloid deposition, and contains the mouse orthologue Oas1a , providing evidence that Oas1a plays an age-dependent function in the innate immune system. We identify a similar interferon-related transcriptional network containing OAS1 by analysing scRNA-seq data from human microglia isolated from individuals with AD. Finally, using human iPSC-derived microglial cells (h-iPSC-Mg), we see that OAS1 is required to limit the pro-inflammatory response of microglia. When stimulated with interferon-gamma (IFN-γ), we note that cells with lower OAS1 expression show an exaggerated pro-inflammatory response, with increased expression and secretion of TNF-α. Collectively, our data support a link between genetic risk for AD and susceptibility to critical illness with COVID-19 centred on OAS1 and interferon signalling, a finding with potential implications for future treatments of both AD and COVID-19, and the development of biomarkers to track disease progression.
Abstract To facilitate precision medicine and neuroscience research, we developed a machine-learning technique that scores the likelihood that a gene, when mutated, will cause a neurological phenotype. We analysed 1126 genes relating to 25 subtypes of Mendelian neurological disease defined by Genomics England (March 2017) together with 154 gene-specific features capturing genetic variation, gene structure and tissue-specific expression and co-expression. We randomly re-sampled genes with no known disease association to develop bootstrapped decision-tree models, which were integrated to generate a decision tree-based ensemble for each disease subtype. Genes generating larger numbers of distinct transcripts and with higher probability of having missense mutations in normal individuals were significantly more likely to cause neurological diseases. Using mouse-mutant phenotypic data we tested the accuracy of gene-phenotype predictions and found that for 88% of all disease subtypes there was a significant enrichment of relevant phenotypic abnormalities when predicted genes were mutated in mice and in many cases mutations produced specific and matching phenotypes. Furthermore, using only newly identified genes included in the Genomics England November 2017 release, we assessed our gene-phenotype predictions and showed an 8.3 fold enrichment relative to chance for correct predictions. Thus, we demonstrate both the explanatory and predictive power of machine-learning-based models in neurological disease.
Dysregulation of RNA splicing contributes to both rare and complex diseases. RNA-sequencing data from human tissues has shown that this process can be inaccurate, resulting in the presence of novel introns detected at low frequency across samples and within an individual. To enable the full spectrum of intron use to be explored, we have developed IntroVerse, which offers an extensive catalogue on the splicing of 332,571 annotated introns and a linked set of 4,679,474 novel junctions covering 32,669 different genes. This dataset has been generated through the analysis of 17,510 human control RNA samples from 54 tissues provided by the Genotype-Tissue Expression Consortium. IntroVerse has two unique features: (i) it provides a complete catalogue of novel junctions and (ii) each novel junction has been assigned to a specific annotated intron. This unique, hierarchical structure offers multiple uses, including the identification of novel transcripts from known genes and their tissue-specific usage, and the assessment of background splicing noise for introns thought to be mis-spliced in disease states. IntroVerse provides a user-friendly web interface and is freely available at https://rytenlab.com/browser/app/introverse.
The basis for clinical variation related to underlying progressive supranuclear palsy (PSP) pathology is unknown. We performed a genome-wide association study (GWAS) to identify genetic determinants of PSP phenotype.Two independent pathological and clinically diagnosed PSP cohorts were genotyped and phenotyped to create Richardson syndrome (RS) and non-RS groups. We carried out separate logistic regression GWASs to compare RS and non-RS groups and then combined datasets to carry out a whole cohort analysis (RS = 367, non-RS = 130). We validated our findings in a third cohort by referring to data from 100 deeply phenotyped cases from a recent GWAS. We assessed the expression/coexpression patterns of our identified genes and used our data to carry out gene-based association testing.Our lead single nucleotide polymorphism (SNP), rs564309, showed an association signal in both cohorts, reaching genome-wide significance in our whole cohort analysis (odds ratio = 5.5, 95% confidence interval = 3.2-10.0, p = 1.7 × 10-9 ). rs564309 is an intronic variant of the tripartite motif-containing protein 11 (TRIM11) gene, a component of the ubiquitin proteasome system (UPS). In our third cohort, minor allele frequencies of surrogate SNPs in high linkage disequilibrium with rs564309 replicated our findings. Gene-based association testing confirmed an association signal at TRIM11. We found that TRIM11 is predominantly expressed neuronally, in the cerebellum and basal ganglia.Our study suggests that the TRIM11 locus is a genetic modifier of PSP phenotype and potentially adds further evidence for the UPS having a key role in tau pathology, therefore representing a target for disease-modifying therapies. Ann Neurol 2018;84:485-496.