logo
    motifDiverge: a model for assessing the statistical significance of gene regulatory motif divergence between two DNA sequences
    0
    Citation
    0
    Reference
    10
    Related Paper
    Abstract:
    Next-generation sequencing technology enables the identification of thousands of gene regulatory sequences in many cell types and organisms. We consider the problem of testing if two such sequences differ in their number of binding site motifs for a given transcription factor (TF) protein. Binding site motifs impart regulatory function by providing TFs the opportunity to bind to genomic elements and thereby affect the expression of nearby genes. Evolutionary changes to such functional DNA are hypothesized to be major contributors to phenotypic diversity within and between species; but despite the importance of TF motifs for gene expression, no method exists to test for motif loss or gain. Assuming that motif counts are Binomially distributed, and allowing for dependencies between motif instances in evolutionarily related sequences, we derive the probability mass function of the difference in motif counts between two nucleotide sequences. We provide a method to numerically estimate this distribution from genomic data and show through simulations that our estimator is accurate. Finally, we introduce the R package {\tt motifDiverge} that implements our methodology and illustrate its application to gene regulatory enhancers identified by a mouse developmental time course experiment. While this study was motivated by analysis of regulatory motifs, our results can be applied to any problem involving two correlated Bernoulli trials.
    Keywords:
    Motif (music)
    Sequence motif
    DNA binding site
    Abstract Enhancer sequences control gene expression and comprise binding sites (motifs) for different transcription factors (TFs). Despite extensive genetic and computational studies, the relationship between DNA sequence and regulatory activity is poorly understood and enhancer de novo design is considered impossible. Here we built a deep learning model, DeepSTARR, to quantitatively predict the activities of thousands of developmental and housekeeping enhancers directly from DNA sequence in Drosophila melanogaster S2 cells. The model learned relevant TF motifs and higher-order syntax rules, including functionally non-equivalent instances of the same TF motif that are determined by motif-flanking sequence and inter-motif distances. We validated these rules experimentally and demonstrated their conservation in human by testing more than 40,000 wildtype and mutant Drosophila and human enhancers. Finally, we designed and functionally validated synthetic enhancers with desired activities de novo .
    Sequence motif
    Motif (music)
    Citations (11)
    As the transcription factor binding sites,the motifs play an important role in promoter recognition and the gene transcription and expression.Finding their characteristics is obviously a meaningful subject.According on the correlation detection of motifs as promoter features,a method for getting motif packages is given by independent component analysis.It can decompose the obverved motif frequency matrix into smaller components and finally obtain the motif packages with concurrent motifs.In the experiment results,the influence of DNA sequence data selection to the motif packages is analyzed.More stable motif relation structures can be obtained using longer motifs.
    Motif (music)
    Sequence motif
    Structural motif
    Transcription
    Citations (0)
    Author(s): Grudzien, Jessica | Advisor(s): Farley, Emma K | Abstract: Enhancers are elements within our genome that control where and when genes are expressed throughout development. However, how the sequence of the enhancer regulates tissue-specific expression is not fully understood. We can investigate sequence by looking at the transcription factor binding motifs within enhancers. We want to better understand how combinations of motif syntax: the order, orientation, and spacing of motifs, interplays with motif affinity to regulate gene expression. We term the connections between syntax and affinity enhancer grammar. We use the neural plate Otx-a enhancer within Ciona intestinalis as a model to investigate if enhancer grammar is present, and to look for motif syntax and affinity trends giving different expression patterns. This enhancer activates when bound by GATA and ETS transcription factors. Our data shows that there is a grammar present within the Otx-a enhancer, as different grammatical variants give varying expression patterns. Our data suggests that changing the motif order to having ETS binding sites on both ends of the enhancer abolishes Otx gene expression in most developing embryos. Our data also shows a loss of expression caused by ETS motifs directly next to each other in combination with non-optimal spacing between high affinity GATA and ETS. We also found a grammatical variant with notochord expression, which may be due to two ETS sites close to a FoxA binding site. These findings help us better understand the grammar of the Otx-a enhancer and help us understand how enhancer sequence codes tissue-specific gene expression in development.
    Sequence motif
    Ciona
    Enhancer RNAs
    Citations (0)
    The information about when and where each gene is to be expressed is mainly encoded in the DNA sequence of enhancers, sequence elements that comprise binding sites (motifs) for different transcription factors (TFs). Most of the research on enhancer sequences has been focused on TF motif presence, whereas the enhancer syntax, that is, the flexibility of important motif positions and how the sequence context modulates the activity of TF motifs, remains poorly understood. Here, we explore the rules of enhancer syntax by a two-pronged approach in Drosophila melanogaster S2 cells: we (1) replace important TF motifs by all possible 65,536 eight-nucleotide-long sequences and (2) paste eight important TF motif types into 763 positions within 496 enhancers. These complementary strategies reveal that enhancers display constrained sequence flexibility and the context-specific modulation of motif function. Important motifs can be functionally replaced by hundreds of sequences constituting several distinct motif types, but these are only a fraction of all possible sequences and motif types. Moreover, TF motifs contribute with different intrinsic strengths that are strongly modulated by the enhancer sequence context (the flanking sequence, the presence and diversity of other motif types, and the distance between motifs), such that not all motif types can work in all positions. The context-specific modulation of motif function is also a hallmark of human enhancers, as we demonstrate experimentally. Overall, these two general principles of enhancer sequences are important to understand and predict enhancer function during development, evolution, and in disease.
    Sequence motif
    Motif (music)
    Structural motif
    Citations (12)
    Abstract Many transposable elements (TEs) contain transcription factor binding sites and are implicated as potential regulatory elements. However, TEs are rarely functionally tested for regulatory activity, which in turn limits our understanding of how TE regulatory activity has evolved. We systematically tested the human LTR18A subfamily for regulatory activity using massively parallel reporter assay (MPRA) and found AP-1 and C/EBP-related binding motifs as drivers of enhancer activity. Functional analysis of evolutionarily reconstructed ancestral sequences revealed that LTR18A elements have generally lost regulatory activity over time through sequence changes, with the largest effects occurring due to mutations in the AP-1 and C/EBP motifs. We observed that the two motifs are conserved at higher rates than expected based on neutral evolution. Finally, we identified LTR18A elements as potential enhancers in the human genome, primarily in epithelial cells. Together, our results provide a model for the origin, evolution, and co-option of TE-derived regulatory elements.
    Subfamily
    Conserved sequence
    DNA binding site
    Sequence motif
    Citations (1)
    Abstract The information about when and where each gene is to be expressed is mainly encoded in the DNA sequence of enhancers, sequence elements that comprise binding sites (motifs) for different transcription factors (TFs). Most of the research on enhancer sequences has been focused on TF motif presence, while the enhancer syntax, i.e. the flexibility of important motif positions and how the sequence context modulates the activity of TF motifs, remain poorly understood. Here, we explore the rules of enhancer syntax by a two-pronged approach in Drosophila melanogaster S2 cells: we (1) replace important motifs by an exhaustive set of all possible 65,536 eight-nucleotide-long random sequences and (2) paste eight important TF motif types into 763 positions within 496 enhancers. These complementary strategies reveal that enhancers display constrained sequence flexibility and the context-specific modulation of motif function. Important motifs can be functionally replaced by hundreds of sequences constituting several distinct motif types, but only a fraction of all possible sequences and motif types restore enhancer activity. Moreover, TF motifs contribute with different intrinsic strengths that are strongly modulated by the enhancer sequence context (the flanking sequence, presence and diversity of other motif types, and distance between motifs), such that not all motif types can work in all positions. The context-specific modulation of motif function is also a hallmark of human enhancers and TF motifs, as we demonstrate experimentally. Overall, these two general principles of enhancer sequences are important to understand and predict enhancer function during development, evolution and in disease.
    Motif (music)
    Sequence motif
    Structural motif
    Melanogaster
    Citations (1)