Abstract Dissecting the myriad regulatory mechanisms controlling eukaryotic transcripts from production to degradation requires quantitative measurements of mRNA flow across the cell. We developed subcellular TimeLapse-seq to measure the rates at which RNAs are released from chromatin, exported from the nucleus, loaded onto polysomes, and degraded within the nucleus and cytoplasm. These rates varied substantially, yet transcripts from genes with related functions or targeted by the same transcription factors and RNA binding proteins flowed across subcellular compartments with similar kinetics. Verifying these associations uncovered roles for DDX3X and PABPC4 in nuclear export. For hundreds of genes, most transcripts were degraded within the nucleus, while the remaining molecules were exported and persisted with stable lifespans. Transcripts residing on chromatin for longer had extended poly(A) tails, whereas the reverse was observed for cytoplasmic mRNAs. Finally, a machine learning model identified additional molecular features that underlie the diverse life cycles of mammalian mRNAs.
Single-cell quantification of transcription kinetics and variability promotes a mechanistic understanding of gene regulation. Here, using single-molecule RNA fluorescence in situ hybridization and mathematical modeling, we dissect cellular RNA dynamics for Arabidopsis FLOWERING LOCUS C (FLC). FLC expression quantitatively determines flowering time and is regulated by antisense (COOLAIR) transcription. In cells without observable COOLAIR expression, we quantify FLC transcription initiation, elongation, intron processing, and lariat degradation, as well as mRNA release from the locus and degradation. In these heterogeneously sized cells, FLC mRNA number increases linearly with cell size, resulting in a large cell-to-cell variability in transcript level. This variation is accounted for by cell-size-dependent, Poissonian FLC mRNA production, but not by large transcriptional bursts. In COOLAIR-expressing cells, however, antisense transcription increases with cell size and contributes to FLC transcription decreasing with cell size. Our analysis therefore reveals an unexpected role for antisense transcription in modulating the scaling of transcription with cell size.
Abstract Adverse drug reactions (ADRs) are one of the leading causes of morbidity and mortality in health care. Understanding which drug targets are linked to ADRs can lead to the development of safer medicines. Here, we analyze in vitro secondary pharmacology of common (off) targets for 2134 marketed drugs. To associate these drugs with human ADRs, we utilized FDA Adverse Event Reports and developed random forest models that predict ADR occurrences from in vitro pharmacological profiles. By evaluating Gini importance scores of model features, we identify 221 target-ADR associations, which co-occur in PubMed abstracts to a greater extent than expected by chance. Among these are established relations, such as the association of in vitro hERG binding with cardiac arrhythmias, which further validate our machine learning approach. Evidence on bile acid metabolism supports our identification of associations between the Bile Salt Export Pump and renal, thyroid, lipid metabolism, respiratory tract and central nervous system disorders. Unexpectedly, our model suggests PDE3 is associated with 40 ADRs. These associations provide a comprehensive resource to support drug development and human biology studies.
Abstract The primary bottleneck in high-throughput genomics experiments is identifying the most important genes and their relevant functions from a list of gene hits. Existing methods such as Gene Ontology (GO) enrichment analysis provide insight at the gene set level. For individual genes, GO annotations are static and biological context can only be added by manual literature searches. Here, we introduce GeneWalk ( github.com/churchmanlab/genewalk ), a method that identifies individual genes and their relevant functions under a particular experimental condition. After automatic assembly of an experiment-specific gene regulatory network, GeneWalk quantifies the similarity between vector representations of each gene and its GO annotations through representation learning, yielding annotation significance scores that reflect their functional relevance for the experimental context. We demonstrate the use of GeneWalk analysis of RNA-seq and nascent transcriptome (NET-seq) data from human cells and mouse brains, validating the methodology. By performing gene- and condition-specific functional analysis that converts a list of genes into data-driven hypotheses, GeneWalk accelerates the interpretation of high-throughput genetics experiments.
Low copy number plasmids in bacteria require segregation for stable inheritance through cell division. This is often achieved by a parABC locus, comprising an ATPase ParA, DNA-binding protein ParB and a parC region, encoding ParB-binding sites. These minimal components space plasmids equally over the nucleoid, yet the underlying mechanism is not understood. Here we investigate a model where ParA-ATP can dynamically associate to the nucleoid and is hydrolyzed by plasmid-associated ParB, thereby creating nucleoid-bound, self-organizing ParA concentration gradients. We show mathematically that differences between competing ParA concentrations on either side of a plasmid can specify regular plasmid positioning. Such positioning can be achieved regardless of the exact mechanism of plasmid movement, including plasmid diffusion with ParA-mediated immobilization or directed plasmid motion induced by ParB/parC-stimulated ParA structure disassembly. However, we find experimentally that parABC from Escherichia coli plasmid pB171 increases plasmid mobility, inconsistent with diffusion/immobilization. Instead our observations favor directed plasmid motion. Our model predicts less oscillatory ParA dynamics than previously believed, a prediction we verify experimentally. We also show that ParA localization and plasmid positioning depend on the underlying nucleoid morphology, indicating that the chromosomal architecture constrains ParA structure formation. Our directed motion model unifies previously contradictory models for plasmid segregation and provides a robust mechanistic basis for self-organized plasmid spacing that may be widely applicable.
BackgroundAdverse drug reactions (ADRs) are one of the leading causes of morbidity and mortality in health care. Understanding which drug targets are linked to ADRs can lead to the development of safer medicines.MethodsHere, we analyse in vitro secondary pharmacology of common (off) targets for 2134 marketed drugs. To associate these drugs with human ADRs, we utilized FDA Adverse Event Reports and developed random forest models that predict ADR occurrences from in vitro pharmacological profiles.FindingsBy evaluating Gini importance scores of model features, we identify 221 target-ADR associations, which co-occur in PubMed abstracts to a greater extent than expected by chance. Amongst these are established relations, such as the association of in vitro hERG binding with cardiac arrhythmias, which further validate our machine learning approach. Evidence on bile acid metabolism supports our identification of associations between the Bile Salt Export Pump and renal, thyroid, lipid metabolism, respiratory tract and central nervous system disorders. Unexpectedly, our model suggests PDE3 is associated with 40 ADRs.InterpretationThese associations provide a comprehensive resource to support drug development and human biology studies.FundingThis study was not supported by any formal funding bodies.
Quantifying crucial steps in gene regulation during transcription elongation, such as promoter-proximal pausing, requires high resolution methods to map the transcription machinery across the genome. Native Elongating Transcript sequencing (NET-seq) interrogates the 3' ends of nascent RNA through sequencing, providing a direct visualization of RNA Polymerase II (Pol II) positions genome-wide with strand specificity and single nucleotide resolution. NET-seq applied to human cells has uncovered regions of Pol II pausing at the boundaries of retained exons and convergent antisense transcription near transcription start sites (Mayer et al. 2015). It has also been used to investigate regulators of productive elongation (Winter et al. 2017), and the directionality of promoter regions (Jin et al. 2017). Here, we describe the experimental protocol for metazoan cells that includes a spike-in control enabling normalization across samples. We also report on an improved bioinformatics pipeline for NET-seq. Together, the protocol yields a fast and non-perturbative method to map Pol II transcription genome-wide, revealing complex and global transcriptional events. 1. Introduction Transcription regulation has many layers of complexity. Canonically, we understand that transcription is controlled by promoters and distal regulatory regions, or enhancers, that are proposed to be the primary determinant of cell type–specific gene expression. Additionally, transcription elongation and other post-initiation events, such as promoter-proximal pausing, are emerging as crucial regulatory steps in controlling gene expression (Adelman and Lis 2012; Margaritis and Holstege 2008). Non-coding RNA transcripts, such as long intergenic non-coding and antisense RNAs, are involved in gene regulation of specific genes or of larger regions, such as in X chromosome inactivation (Rinn and Chang 2012; Ietswaart, Wu, and Dean 2012). The deep analysis of steady-state pools of RNA have yielded useful insights into nascent transcription (Boswell et al. 2017), but they fail to capture unstable RNA species, such as enhancer RNAs, and transcriptional pausing. Traditional transcription run-on techniques only observe a few genes at a time. As transcription elongation has emerged as a crucial regulatory step in controlling gene expression, it is critical to directly monitor the elongation process to identify all layers of gene regulation. Native elongating transcript sequencing (NET-seq) is a non-perturbative method which detects actively elongating RNA polymerase II (Pol II) genome-wide --with strand-specific nucleotide resolution, in vivo. This chapter provides a step-by-step protocol for NET-seq and a bioinformatics pipeline for sequence analysis. The NET-seq protocol begins with purifying nascent RNA, which is done by cellular fractionation in metazoan cells (Figure 1a). Briefly, cells are lysed, and the nuclei are isolated from the cytoplasm by centrifugation through a sucrose cushion buffer. The nuclei are then washed to eliminate any remaining cytoplasmic matter. The chromatin fraction is isolated from the nucleoplasm using urea, salt, and mild detergents. While urea removes most chromatin-bound proteins, it does not remove histones or elongating RNA polymerase from DNA. The RNA polymerase-RNA-DNA ternary complex is stable and capable of withstanding high concentrations of salt, urea, and detergents (Wuarin and Schibler 1994; Cai and Luse 1987). Due to this stability, the isolated chromatin fraction is enriched for nascent RNAs that arise from transcriptionally engaged RNA polymerase complexes. As the histone proteins also remain on DNA after urea treatment, the chromatin fraction can be isolated through low-speed centrifugation. Isolating the chromatin fraction completes the process of purifying the nascent RNA. An advantage of this purification approach is that a restart of transcription in vitro is not required and can therefore isolate nascent RNA from Pol II in multiple transcriptional states. Furthermore, possible biases arising from antisera-based purification of Pol II from epitope masking and cross-reactivity are avoided. After the nascent RNA has been purified, the sequencing library preparation can proceed (Figure 1b). This begins with the ligation of a linker to the 3′ end of the RNA, which preserves the information on the 3′ end of RNAs and allows for nucleotide precision of the location of Pol II. The linker contains a random sequence at the 5′ end that serves as a unique molecular identifier (UMI), enabling the bioinformatic detection of multiple library generation biases, including PCR duplicates and reverse transcription mispriming events. After 3’ ligation of the linker, the RNA is fragmented and size selected. Fragmenting the RNA helps avoid length biases in any downstream enzymatic reactions. After fragmentation, the RNA is reverse transcribed to create single-stranded cDNA using a primer with a long overhang. The cDNA, containing a 3' adaptor, is then circularized. Sequence elements introduced in the RT primer can then serve as a 5' adaptor to allow for PCR amplification (see Figure 1b). The PCR product is then sequenced using a next generation sequencing platform, typically Illumina. The resulting sequencing reads are aligned, and data analysis is then performed. The NET-seq bioinformatics pipeline for sequence alignment has improved compared to the original pipeline in Mayer et al. 2015; the RT mispriming and PCR duplicate removal scripts now perform a more stringent comparison of read alignment (reducing the number of reads filtered out at this stage) which generates a 2-3 fold higher coverage (scripts are available at Churchman lab GitHub). NET-seq surveys transcriptional processes and Pol II occupancy with DNA strand specificity and nucleotide resolution. It does not require any genetic modification or any metabolic labeling of nascent RNA. NET-seq also works on a number of different cell types including various human cell lines (Winter et al. 2017; Mayer et al. 2015), as well as mouse and Drosophila cells (see Note 19). It should be noted that RNA processing intermediates and some mature chromatin-associated RNAs may be included in the library, although these can be computationally removed through comparison with gene annotations. Furthermore, NET-seq does not provide information about the positions of pre-initiation complexes and transcription start sites. In sum, NET-seq is a straightforward, easy to use, in vivo, high resolution methodology that captures complex and global transcriptional events by directly monitoring nascent coding, non-coding, antisense, intergenic, and enhancer RNAs.