Identifying master regulators of biological processes and mapping their downstream gene networks are key challenges in systems biology. We developed a computational method, called iRegulon, to reverse-engineer the transcriptional regulatory network underlying a co-expressed gene set using cis-regulatory sequence analysis. iRegulon implements a genome-wide ranking-and-recovery approach to detect enriched transcription factor motifs and their optimal sets of direct targets. We increase the accuracy of network inference by using very large motif collections of up to ten thousand position weight matrices collected from various species, and linking these to candidate human TFs via a motif2TF procedure. We validate iRegulon on gene sets derived from ENCODE ChIP-seq data with increasing levels of noise, and we compare iRegulon with existing motif discovery methods. Next, we use iRegulon on more challenging types of gene lists, including microRNA target sets, protein-protein interaction networks, and genetic perturbation data. In particular, we over-activate p53 in breast cancer cells, followed by RNA-seq and ChIP-seq, and could identify an extensive up-regulated network controlled directly by p53. Similarly we map a repressive network with no indication of direct p53 regulation but rather an indirect effect via E2F and NFY. Finally, we generalize our computational framework to include regulatory tracks such as ChIP-seq data and show how motif and track discovery can be combined to map functional regulatory interactions among co-expressed genes. iRegulon is available as a Cytoscape plugin from http://iregulon.aertslab.org.
The identification of functional non-coding mutations is a key challenge in the field of genomics. Here we introduce μ-cisTarget to filter, annotate and prioritize cis-regulatory mutations based on their putative effect on the underlying "personal" gene regulatory network. We validated μ-cisTarget by re-analyzing the TAL1 and LMO1 enhancer mutations in T-ALL, and the TERT promoter mutation in melanoma. Next, we re-sequenced the full genomes of ten cancer cell lines and used matched transcriptome data and motif discovery to identify master regulators with de novo binding sites that result in the up-regulation of nearby oncogenic drivers. μ-cisTarget is available from http://mucistarget.aertslab.org .
Abstract Single-cell assay for transposase-accessible chromatin by sequencing (scATAC-seq) has emerged as a powerful tool for dissecting regulatory landscapes and cellular heterogeneity. However, an exploration of systemic biases among scATAC-seq technologies has remained absent. In this study, we benchmark the performance of eight scATAC-seq methods across 47 experiments using human peripheral blood mononuclear cells (PBMCs) as a reference sample and develop PUMATAC, a universal preprocessing pipeline, to handle the various sequencing data formats. Our analyses reveal significant differences in sequencing library complexity and tagmentation specificity, which impact cell-type annotation, genotype demultiplexing, peak calling, differential region accessibility and transcription factor motif enrichment. Our findings underscore the importance of sample extraction, method selection, data processing and total cost of experiments, offering valuable guidance for future research. Finally, our data and analysis pipeline encompasses 169,000 PBMC scATAC-seq profiles and a best practices code repository for scATAC-seq data analysis, which are freely available to extend this benchmarking effort to future protocols.
The diversity of cell types and regulatory states in the brain, and how these change during aging, remains largely unknown. We present a single-cell transcriptome atlas of the entire adult Drosophila melanogaster brain sampled across its lifespan. Cell clustering identified 87 initial cell clusters that are further subclustered and validated by targeted cell-sorting. Our data show high granularity and identify a wide range of cell types. Gene network analyses using SCENIC revealed regulatory heterogeneity linked to energy consumption. During aging, RNA content declines exponentially without affecting neuronal identity in old brains. This single-cell brain atlas covers nearly all cells in the normal brain and provides the tools to study cellular diversity alongside other Drosophila and mammalian single-cell datasets in our unique single-cell analysis platform: SCope (http://scope.aertslab.org). These results, together with SCope, allow comprehensive exploration of all transcriptional states of an entire aging brain.
Abstract Joint profiling of chromatin accessibility and gene expression in individual cells provides an opportunity to decipher enhancer-driven gene regulatory networks (GRNs). Here we present a method for the inference of enhancer-driven GRNs, called SCENIC+. SCENIC+ predicts genomic enhancers along with candidate upstream transcription factors (TFs) and links these enhancers to candidate target genes. To improve both recall and precision of TF identification, we curated and clustered a motif collection with more than 30,000 motifs. We benchmarked SCENIC+ on diverse datasets from different species, including human peripheral blood mononuclear cells, ENCODE cell lines, melanoma cell states and Drosophila retinal development. Next, we exploit SCENIC+ predictions to study conserved TFs, enhancers and GRNs between human and mouse cell types in the cerebral cortex. Finally, we use SCENIC+ to study the dynamics of gene regulation along differentiation trajectories and the effect of TF perturbations on cell state. SCENIC+ is available at scenicplus.readthedocs.io .
Article Figures and data Abstract Editor's evaluation Introduction Results Discussion Materials and methods Data availability References Decision letter Author response Article and author information Metrics Abstract Understanding how enhancers drive cell-type specificity and efficiently identifying them is essential for the development of innovative therapeutic strategies. In melanoma, the melanocytic (MEL) and the mesenchymal-like (MES) states present themselves with different responses to therapy, making the identification of specific enhancers highly relevant. Using massively parallel reporter assays (MPRAs) in a panel of patient-derived melanoma lines (MM lines), we set to identify and decipher melanoma enhancers by first focusing on regions with state-specific H3K27 acetylation close to differentially expressed genes. An in-depth evaluation of those regions was then pursued by investigating the activity of overlapping ATAC-seq peaks along with a full tiling of the acetylated regions with 190 bp sequences. Activity was observed in more than 60% of the selected regions, and we were able to precisely locate the active enhancers within ATAC-seq peaks. Comparison of sequence content with activity, using the deep learning model DeepMEL2, revealed that AP-1 alone is responsible for the MES enhancer activity. In contrast, SOX10 and MITF both influence MEL enhancer function with SOX10 being required to achieve high levels of activity. Overall, our MPRAs shed light on the relationship between long and short sequences in terms of their sequence content, enhancer activity, and specificity across melanoma cell states. Editor's evaluation This study describes an integrative analysis of the location, regulation, and function of melanoma cell state-specific enhancer elements. By comparing enhancer activity through massively parallel reporter assays, chromatin features, and underlying TF binding profiles in melanocytic and mesenchymal-like melanoma cell states, the authors identify candidate regulators and mechanisms that explain enhancer activity and specificity in melanoma biology. These findings will be of broad interest to those seeking to understand cell type- or cell identify-specific gene regulation at the level of transcriptional and epigenetic control of cis-regulatory elements. https://doi.org/10.7554/eLife.71735.sa0 Decision letter Reviews on Sciety eLife's review process Introduction Enhancers are crucial regulatory regions in the genome that control cell type-specific gene expression. Identifying enhancers helps to better understand cell identity and is key to develop therapies targeting a singular relevant cell type in a disease. To date, accurate prediction of enhancer location and cell-type activity remains a challenge. Both the presence and clustering of transcription factor binding sites (TFBSs) are good predictors of enhancer activity (Gasperini et al., 2020; King et al., 2020). Yet, such an approach requires prior knowledge of the cis-regulatory grammar in the studied cell types as only a small proportion of the TFBSs found in the genome are bound by the corresponding transcription factor (TF) (Yáñez-Cuna et al., 2012). Another strategy to identify candidate enhancers is to use active enhancer marks such as H3K27ac and chromatin accessibility (Gray et al., 2017; Minnoye et al., 2021; Rada-Iglesias et al., 2011). The most successful studies, combining this approach with transcriptome data, generated libraries with up to 60% of active enhancers in the target cell type (Gorkin et al., 2020; Graybuck et al., 2021). Massively parallel reporter assays (MPRAs) have been developed to screen the activity of thousands of sequences simultaneously (Arnold et al., 2013; Inoue and Ahituv, 2015; Melnikov et al., 2012; White et al., 2013). However, limitations of sequence synthesis constrain one to choose either a large number of short sequences (e.g., thousands of sequences of 150–250 bp) or a small number of longer sequences (e.g., dozens of sequences of 500–1000 bp) (Inoue and Ahituv, 2015). This issue, combined with the difficulty of identifying putative enhancers, leads to a low rate of active enhancers in MPRAs. Here, we study enhancer location, specificity, and regulatory grammar in melanoma using a variety of MPRA strategies. Melanoma exhibits pronounced heterogeneity within and between patients (Grzywa et al., 2017). Two main subtypes or cell states are discernible, the melanocytic (MEL) and the mesenchymal-like (MES) state (Hoek et al., 2008; Hoek et al., 2006; Verfaillie et al., 2015), as well as more recently identified variants of the MEL state, such as the neural crest-like and intermediate states (Rambow et al., 2018; Tsoi et al., 2018; Wouters et al., 2020). The MEL and MES subtypes display distinct epigenomic and transcriptomic profiles, resulting in divergent phenotypes such as migration and therapy response (Wouters et al., 2020; Verfaillie et al., 2015). Thus, the identification of subtype-specific enhancers may be relevant for therapy, where it could improve safety and efficiency by narrowing down the effect of the treatment to a specific population. Comparisons between MEL and MES yield thousands of regions with differential acetylation (H3K27ac) and accessibility. However, it is unclear which of these subtype-specific regions function as active enhancers and which TFs are responsible for their activity. In this study, we analyzed regions with MEL- and MES-specific H3K27ac chromatin marks proximal to differentially expressed genes. We designed MPRA experiments to test those regions at three different levels in a panel of patient-derived malignant melanoma (MM) lines. Our results precisely locate the origins of enhancer activity within the larger H3K27ac domains. In addition, we can accurately predict their subtype specificity and ultimately identify a set of rules governing MEL and MES enhancer activity. Furthermore, we show that a melanoma deep learning model (DeepMEL2; Atak et al., 2021) trained on ATAC-seq data pinpoints which TFBSs drive enhancer activity and specificity. Results Design of MPRA libraries based on H3K27ac, ATAC-seq, and synthetic sequences H3K27ac ChIP-seq peaks are often used for the selection of candidate enhancers (Creyghton et al., 2010; Fox et al., 2020; Fu et al., 2018). However, such peaks can encompass large genomic regions, often 2–3 kb long, while enhancers are usually only a few hundred base pairs in size (Gasperini et al., 2020; Li and Wunderlich, 2017). To investigate the relationship between H3K27ac signal, chromatin accessibility peaks, and enhancer activity, we designed MPRA libraries at three different levels: 1.2–2.9-kb-sized H3K27ac ChIP-seq peaks, 501-bp-sized ATAC-seq peaks that fall within the H3K27ac regions, and 190-bp subsequences tiling the entire H3K27ac regions. We designed the H3K27ac ChIP-seq-based library (H3K27ac library) by selecting regions that are specifically acetylated in either the MEL or the MES melanoma cell state and located around differentially expressed genes in a panel of 12 melanoma lines: 3 MES lines (MM029, MM047, and MM099) and 9 MEL lines that cover a spectrum from pure melanocytic to intermediate melanoma (MM001, MM011, MM031, MM034, MM057, MM074, MM087, MM118, SKMEL5) (see Materials and methods, Figure 1—figure supplement 1a, Figure 1a; Minnoye et al., 2020). A special consideration was given to regions overlapping ChIP-seq peaks for SOX10 and MITF (for MEL regions) or AP-1 (JUN and JUNB; for MES regions), known regulators of each state (Wouters et al., 2020). A total of 35 MES- and 18 MEL-specific regions, with an average size of 1987 bp, were amplified from genomic DNA (Supplementary file 1). Regions were named ‘GENE_(-)000X,’ where GENE is the associated target gene, (-)000 is the distance to the TSS, and X denotes whether the enhancer is distal, intronic, or at a promoter position (see Materials and methods). The H3K27ac ChIP-seq signal across these selected regions displays a good correlation between cell lines of the same subtype and a negative correlation between cell lines of a different subtype (Figure 1—figure supplement 1b). This correlation is also observed in the ATAC-seq signal and the target gene expression (Figure 1—figure supplement 1c, d, and h). We created two vector libraries based on the CHEQ-seq vector backbone (Verfaillie et al., 2016) by cloning the sequences upstream (5′ position) or downstream (intron position) of a minimal promoter (SCP1, see Materials and methods, Figure 1b). Figure 1 with 1 supplement see all Download asset Open asset Selection and cloning of melanoma state specific regions. (a) Cell state-specific regions were selected based on H3K27ac ChIP-seq signal from a panel of melanoma cell lines containing both mesenchymal-like (MES) (red bar) and melanocytic (MEL) lines (pure MEL: blue bar; intermediate MEL: purple bar). ATAC-seq data from the same lines were used to identify accessibility peaks within these regions. Finally, the regions were tiled with 190 bp tiles with a shift of 20 bp (sublibrary A). Sublibrary B was generated by shifting all tiles 10 bp downstream. Cluster-Buster (CBust) was used to identify transcription factor (TF) motifs, and new tiles were generated with mutated motifs. (b) Reporter vector configurations used for the evaluation of the H3K27ac enhancers (left panel), the ATAC-seq enhancers (top-right panel), and the enhancer tiling (bottom-right panel). SCP1, Super Core Promoter 1; BC, barcode. Next, we designed a second library consisting of the ATAC-seq peaks contained within the H3K27ac library regions (Figure 1a). We used only the H3K27ac peaks that are identified as active in the H3K27ac library or that are assigned as MEL- or MES-specific regulatory regions (respectively represented by regions belonging to topic 4 and topic 7) in our previously published cisTopic analysis of ATAC-seq data from 16 human melanoma cell lines (Bravo González-Blas et al., 2019). Each sequence was defined by taking the summit of the ATAC-seq peak and extending 250 bp on each side, resulting in a 501 bp candidate sequence. In total, 28 MES- and 18 MEL-specific ATAC-seq peaks were selected from 37 of the 53 H3K27ac regions (for nine regions, two ATAC-seq peaks were selected). Over the 501 bp of those regions, the correlation of the H3K27ac ChIP-seq, ATAC-seq, and RNA-seq signals between the different cell lines remains the same as for the H3K27ac library (Figure 1—figure supplement 1e-g). We cloned the regions in the CHEQ-seq vector, upstream of the SCP1 promoter (Figure 1b). In addition, we cloned the same set of sequences in the STARR-seq vector (an alternative MPRA vector) to assess assay-related variability (Muerdter et al., 2018). Finally, to locate the precise origin of enhancer activity, we generated two tiling libraries (A and B, see Materials and methods), encompassing the entire H3K27ac regions. The tiles are 190 bp long with a 20 bp shift between consecutive tiles. The tiling for library A starts at nucleotide position 1 of the H3K27ac ChIP-seq regions while library B starts at position 11, resulting in a final tiling resolution of 10 bp when both sublibraries are taken into account. We also sought to probe the effect of mutations within putative TFBSs in the selected H3K27ac regions. We used Cluster-Buster (CBust; Frith et al., 2003) with position weight matrices (PWMs) for binding sites of key regulators of MEL (SOX10, MITF, TFAP2A) and MES (AP-1 and TEAD) (Wouters et al., 2020) to identify candidate TFBSs and generated tiles carrying mutated versions of these motifs (see Materials and methods). For each sublibrary, 800 shuffled tiles were generated as negative controls, resulting in a total of 7412 and 7356 tiles for sublibraries A and B, respectively (Figure 1a). Each sublibrary is separately cloned upstream of the SCP1 promoter in the CHEQ-seq vector and is transfected individually (Figure 1b, lower-right panel). Most MEL-specific acetylated regions harbor enhancer activity in MEL lines We first transfected all MPRA libraries in the most melanocytic line, MM001 (Minnoye et al., 2020; Wouters et al., 2020). Of the MEL-specific H3K27ac regions, 75% (14/18) display significant enhancer activity (Benjamini–Hochberg adjusted p-values < 0.05, see Materials and methods) in MM001 with a mean log2 fold change (FC) of 0.23 compared to 26% (8/32) for the MES-specific regions with a mean log2 FC of –1.21 (Figure 2a). The activities are consistent across the two library designs (enhancers cloned into the intron or upstream of the TSS; Figure 2—figure supplement 1a and b). The ATAC-based library recapitulates these activities, suggesting that the enhancer activity is contained within the sequence of the ATAC-seq peak (Figure 2b, Figure 2—figure supplement 1c–e). Interestingly, in the five MEL-specific H3K27ac regions where two ATAC-seq peaks were assessed, only one of the two recapitulates the activity of the encompassing region (Figure 2—figure supplement 2). This was independently confirmed using the STARR-seq MPRA (Figure 2—figure supplement 1c). Figure 2 with 2 supplements see all Download asset Open asset Enhancer activity in MM001. (a, b) Enhancer activity profile for the CHEQ-seq intron H3K27ac library (a) and CHEQ-seq ATAC-seq library (b). Enhancer regions displayed in panel (c) have their name indicated in bold and their value is displayed with a triangle. (c) Enhancer activity of regions selected around the TYR genes. SOX10 and MITF ChIP-seq as well as H3K27ac ChIP-seq and ATAC-seq for MM001 are displayed, and, in the zoomed-in regions (light gray areas), DeepMEL2 predictions and CHEQ-seq values of the enhancer tiling B library are represented. Dark gray areas are regions not covered by a tile. CHEQ-seq activity is visible in the ‘H3K27ac enhancers’ and ‘ATAC-seq enhancers’ tracks. Benjamini–Hochberg adjusted p-values: *<0.05; ***<0.001. Dashed box: region not recovered following DNA synthesis, cloning, or massively parallel reporter assay (MPRA). (d) Percentage of overlap between active tiles and ATAC-seq peaks (left) and high DeepMEL2 predictions with ATAC-seq peaks (right).(e, f) Heatmaps of H3K27ac library (e) and ATAC-seq library (f), displaying H3K27ac ChIP-seq signal, ATAC-seq signal, and enhancer activity in MM001 ordered by MPRA values. Only MPRA values of significantly active enhancers are displayed. Figure 2—source data 1 CHEQ-seq 5′ H3K27ac activity. Enhancer activity values for the CHEQ-seq 5′ H3K27ac library in all cell lines. These data were also used in Figures 3 and 6, Figure 2—figure supplement 1, and Figure 3—figure supplement 1. https://cdn.elifesciences.org/articles/71735/elife-71735-fig2-data1-v2.txt Download elife-71735-fig2-data1-v2.txt Figure 2—source data 2 CHEQ-seq intron H3K27ac activity. Enhancer activity values for the CHEQ-seq intron H3K27ac library in all cell lines. These data were also used in Figures 2—6, Figure 2—figure supplements 1 and 2, Figure 3—figure supplements 1 and 2, and Figure 4—figure supplement 1. https://cdn.elifesciences.org/articles/71735/elife-71735-fig2-data2-v2.txt Download elife-71735-fig2-data2-v2.txt Figure 2—source data 3 CHEQ-seq ATAC activity. Enhancer activity values for the CHEQ-seq ATAC library in all cell lines. These data were also used in Figures 3—6, Figure 2—figure supplements 1 and 2, Figure 3—figure supplements 1 and 2, and Figure 4—figure supplement 1. https://cdn.elifesciences.org/articles/71735/elife-71735-fig2-data3-v2.txt Download elife-71735-fig2-data3-v2.txt Figure 2—source data 4 CHEQ-seq tiling library B activity. Enhancer activity values for the CHEQ-seq enhancer tiling library B in all cell lines. These data were also used in Figures 3—6Figure 2—figure supplement 2Figure 3—figure supplement 2, Figure 4—figure supplement 1, Figure 5—figure supplement 1, and Figure 6—figure supplement 1. https://cdn.elifesciences.org/articles/71735/elife-71735-fig2-data4-v2.txt Download elife-71735-fig2-data4-v2.txt Figure 2—source data 5 DeepMEL2 prediction scores for tiling library B. DeepMEL2_GABPA prediction scores for the enhancer tiling B library. These data were also used in Figures 3, 5 and 6, Figure 2—figure supplement 2, Figure 3—figure supplement 2, and Figure 4—figure supplement 1. https://cdn.elifesciences.org/articles/71735/elife-71735-fig2-data5-v2.txt Download elife-71735-fig2-data5-v2.txt Figure 2—source data 6 STARR-seq ATAC activity. Enhancer activity values for the STARR-seq ATAC library in all cell lines. These data were also used in Figures 3 and 6, Figure 2—figure supplement 1, and Figure 3—figure supplement 1. https://cdn.elifesciences.org/articles/71735/elife-71735-fig2-data6-v2.txt Download elife-71735-fig2-data6-v2.txt Next, we examined the activity of all 190 bp sequences tiled along the entire H3K27ac regions. We confirmed that the majority of active tiles (92.2%) are located within an ATAC-seq peak (Figure 2c and d, Figure 2—figure supplement 2) and identified active tiles in 7 out of the 10 most active MEL ATAC-seq-based enhancers. Short 190 bp regions can thus often recapitulate the enhancer activity of the larger encompassing region. When two or more consecutive tiles are active, the enhancer may be contained in an even smaller sequence or the activity is coming from independently active elements close to one another. We recently trained a deep learning model on cis-regulatory topics (sets of co-accessible genomic regions clustered by cisTopic; Bravo González-Blas et al., 2019) from 30 melanoma lines, called DeepMEL2, that accurately predicts the accessibility and activity of a sequence in the different melanoma subtypes (Atak et al., 2021). Each topic used to train the model regrouped accessible regions found in one cell line, in a specific subtype, or in all cell lines. Two topics are associated with the MEL subtype, topics 16 and 17, mostly focused on SOX and MITF motifs, respectively. We scored our 190 bp tiles with DeepMEL2 and found high MEL prediction scores (>0.10, see Materials and methods) specifically within ATAC-seq peaks (Figure 2d). Of the top DeepMEL2 predictions for MEL specific topics, 11% (topic 16) and 17.3% (topic 17) are active tiles in the MPRA (with 0.25/0.375 recall and 0.11/0.173 precision for topics 16 and 17, respectively). These low precision values may be explained by the fact that the DeepMEL2 model was trained on ATAC-seq data, thus yielding high prediction scores within ATAC-seq peaks, yet not all of these show positive MPRA activity. In some cases, we identified multiple acetylated regions near one gene. For the tyrosinase (TYR) gene, expressed specifically in MEL lines (Figure 1—figure supplement 1h), three regions were selected as MEL-specific and tested at the acetylation, accessibility, and tiling levels (Figure 2c). TYR_–9-D and TYR_–1-D regions display high reporter activity, which is subsequently found in the selected ATAC-seq peak. Activity is further narrowed down to tiles corresponding to the DeepMEL2 predictions (Figure 2c). The TYR_P region, at the gene’s promoter site, has a low activity that is also not found when tiling the region, despite the DeepMEL2 predictions. Those findings suggest that TYR expression in MEL lines is largely dependent on the activity of distal enhancers. Some other enhancers that are active in the H3K27ac and ATAC-seq libraries are not recapitulated in the tile library (e.g., SOX10_–34-D; Figure 2—figure supplement 2c). This can be due to technical reasons, such as the small size of the tiles. Nevertheless, from the combination of the ATAC-seq and enhancer tiling MPRAs, we can conclude that not all subtype-specific ATAC-seq peaks function as a stand-alone enhancer. We finally compared signals of H3K27ac ChIP-seq mean score, ATAC-seq mean score, and their corresponding MPRA signals (Figure 2e and f). With the H3K27ac library (Figure 2e), the acetylation signal measured over the selected regions correlates well with the accessibility signal (Pearson’s correlation r = 0.77), indicating that ATAC-seq peaks are found within the selected acetylated regions and maintain the same differential signal. Active enhancers are found in the majority (14/18) of the MEL-specific acetylation regions with ATAC-seq signal. This trend is also visible in the ATAC-seq-based library, with most of the active enhancers detected in ATAC-seq peaks (13/19) (Figure 2f). However, the moderate correlation between ATAC-seq signal and CHEQ-seq activity in the corresponding library (Spearman’s rho = 0.48) also indicates that the peak mean signal is not a good predictor of the activity level of an enhancer. In part, this may be due to confounding of ATAC and H3K27ac read depth by genomic copy number aberrations. Also, the activity displayed by some MES-specific regions lacking ATAC-seq signal in MM001 suggests that closed regions in the genome can still harbor activity in an episomal MPRA. In conclusion, our enhancer selection resulted in a high rate of active enhancers in MM001 and the design of our MPRA libraries allowed us to precisely pinpoint the origins of, at least part of, the enhancer activity. MES-specific H3K27ac/ATAC regions are active in MES lines Next, we transfected all libraries in the MES line MM029. The activity profiles in this line show that, as expected, the majority of MES enhancers display activity at both the H3K27ac and ATAC-seq level (Figure 3a and b, Figure 3—figure supplement 1a–e). In regions with two selected ATAC-seq peaks, both MPRA approaches agree on which peak is driving activity (Figure 3b and d, Figure 3—figure supplement 1c, Figure 3—figure supplement 2). Figure 3 with 2 supplements see all Download asset Open asset Enhancer activity in MM029. Enhancer activity profile for the CHEQ-seq intron H3K27ac library (a) and CHEQ-seq ATAC-seq library (b). Enhancer regions displayed in panel (c) and (d) have their name indicated in bold, and their value is displayed with a different shape. Enhancer activity of regions selected around the COL5A1 (c) and SERPINE1 (d) genes. JUN and JUNB ChIP-seq and H3K27ac ChIP-seq and ATAC-seq for MM029 are displayed, and, in the zoomed-in regions (light gray areas), DeepMEL2 predictions and CHEQ-seq values of the enhancer tiling are represented. Dark gray areas are regions not covered by the tiling library. CHEQ-seq activity is visible in the ‘H3K27ac enhancers’ and ‘ATAC-seq enhancers’ tracks. Benjamini–Hochberg adjusted p-values: *<0.05; ***<0.001. Dashed boxes: regions not recovered following DNA synthesis, cloning, or massively parallel reporter assay (MPRA). Heatmaps of H3K27ac library (e) and ATAC-seq library (f), displaying H3K27ac ChIP-seq signal, ATAC-seq signal, and enhancer activity in MM001 ordered by MPRA values. Only MPRA values of significantly active enhancers are displayed. Figure 3—source data 1 CHEQ-seq tiling library A activity. Enhancer activity values for the CHEQ-seq enhancer tiling library A in all cell lines. These data were also used in Figures 4 and 6, Figure 3—figure supplement 2, Figure 4—figure supplement 1, Figure 5—figure supplement 1, and Figure 6—figure supplement 1. https://cdn.elifesciences.org/articles/71735/elife-71735-fig3-data1-v2.txt Download elife-71735-fig3-data1-v2.txt Figure 3—source data 2 DeepMEL2 prediction scores for tiling library A. DeepMEL2_GABPA prediction scores for the enhancer tiling A library. These data were also used in Figure 6, Figure 3—figure supplement 2, and Figure 4—figure supplement 1. https://cdn.elifesciences.org/articles/71735/elife-71735-fig3-data2-v2.txt Download elife-71735-fig3-data2-v2.txt Multiple regions around the Collagen Type V Alpha 1 Chain (COL5A1) gene, up to 100 kb upstream of the TSS, were found to be specifically acetylated in MES lines, and we included a total of seven into the library (Figure 3c.). Four regions showed significant activity in MM029 (Figure 3c). This activity is further confirmed in the ATAC-seq and tiling libraries: the ATAC-seq peaks within the four active H3K27ac regions are all active, while the ATAC-seq peaks within the three negative H3K27ac regions are all negative. In the COL5A1_–17-D region, tiling suggests the presence of three distinct enhancers, including two that are located within small ATAC-seq peaks that were not selected for the ATAC-based library. DeepMEL2 is trained on both MEL- and MES-accessible regions, and topic 19 has been shown to be the best-performing MES topic (Atak et al., 2021). In our data, high topic 19 predictions often overlap with active tiles in MM029 (0.181 precision, 0.537 recall, Figure 3c and d). Because several other cis-regulatory topics contribute to the prediction of the MES subtype, some tiles do not display a high prediction score for topic 19 despite their activity and are better explained by other topics. The small shift between each tile and the use of two overlapping libraries provide a high number of measurements throughout the regions, which allows for accurate detection of weakly active enhancers. Such enhancers are found in the SERPINE1_–110-I region, where the SERPINE1_–110-IA ATAC-seq peak is inactive in the CHEQ-seq and STARR-seq assays (Figure 3d, Figure 3—figure supplement 1c), but the tiling assay shows robust enhancer activity. Of the two ATAC-seq peaks in the FOSL2_–117-I region, neither one recapitulates the activity of the acetylated region. In contrast, the tiling assay reveals clear enhancer activity in both peaks (Figure 3—figure supplement 2b). Similar conclusions to those above for MEL enhancers can now be drawn for MES regions regarding the relationship between H3K27ac, ATAC-seq signal, and enhancer activity (Figure 3e and f). ATAC-based and tiling CHEQ-seq assays show that most active enhancers in the H3K27Ac regions reside within ATAC-seq peaks (151/164 active tiles are found in ATAC-seq peaks; 92.1%). Irrespective, ATAC-seq peak mean signal remains a poor predictor of the level of enhancer activity, at least as read out by CHEQ-seq. Moreover, the presence of a differentially accessible ATAC-seq peak does not guarantee enhancer function. Indeed, of 26 differentially accessible peaks, only 14 show demonstrable activity (54%). In conclusion, MPRAs performed in MM001 and MM029 have shown a high success rate of selected regions to display activity specifically in their corresponding cell state. However, MM001 and MM029 lie at the extremes of the MEL-MES spectrum. To further investigate how the activity of these regions scales along the MEL-MES axis, we studied them in five additional melanoma cell lines, representing more intermediate or transitory melanoma states (Tsoi et al., 2018; Wouters et al., 2020). MES enhancers show lower but consistent activity in intermediate lines We expanded our panel of cell lines to include two additional MES lines (MM047 and MM099) and three MEL-intermediate lines (MM057, MM074, and MM087). These three lines have high SOX10 and MITF expression (hallmarks of the MEL subtype), yet also show both marker expression and phenotypic characteristics typical for the MES subtype (e.g., AXL expression, TGFb1 signaling activity) (Tsoi et al., 2018). Furthermore, in contrast to MM001, when SOX10 expression is lost, these lines shift toward a MES subtype, with expression of AP-1 and increased chromatin accessibility in MES-specific regions (Wouters et al., 2020). MM057, MM074, and MM087 were part of the cell lines used for the selection of MEL-specific H3K27ac ChIP-seq regions. As such, they display an acetylation and accessibility profile as well as a transcriptional activity of the associated genes similar to MM001 (Figure 1—figure supplement 1b–d). Based on those observations, even though phenotypically different, the MEL-intermediate lines were expected to have an enhancer profile closely related to what we have observed with MM001. Indeed, the enhancer activity profile for the H3K27ac library obtained in intermediate lines correlates well with MM001, except for MM074, which moderately correlates with all lines (Figure 4a). Interestingly, intermediate lines have the same proportion of active MEL enhancers as MM001 and the same proportion of active MES enhancers as the MES lines (Figure 4b). When looking at the mean activity of each enhancer per cell line phenotype (Figure 4c), we found a good correlation of MES region activity between all phenotypes (MES vs. MM001, r = 0.64; MES vs. intermediate r = 0.89). The only difference resides in the strength of the enhancer activity, where MM001 has weak activity and intermediate lines have moderate activity for MES regions. Figure 4 with 1 supplement see all Download asset Open asset Specificity of melanocytic (MEL) and mesenchymal-like (MES) enhancers in intermediate lines. (a–c) Pearson’s correlation coefficient table across 7 malignant melanoma (MM) lines (a), percentage of active MEL and MES enhancers for each line (b), and scatterplot of CHEQ-seq results for the CHEQ-seq intron H3K27ac library. (d–f) same as panels (a–c) but for the CHEQ-seq ATAC-seq library. (g–i) same as panels (a–c) but for the combined CHEQ-seq enhancer tiling libraries. Red, purple, and blue bars next to cell line names and background indicate MES, intermediate, and MEL lines respectively. The comparison of enhancer activity between subtypes highlights the particular case of the SOX10_–56-D region, a MEL-specific H3K27Ac region with enhancer activity in all cell lines (Figure 4c). Based on the tiling profile of that region across all tested lines (Figure 4—figure supplement 1a), we can identify two active enhancers. One enhancer, located within the largest ATAC-seq peak, is MEL-specific and overlaps with the DeepMEL2 predictions for
Summary The Drosophila brain is a work horse in neuroscience. Single-cell transcriptome analysis 1–5, 3D morphological classification 6 , and detailed EM mapping of the connectome 7–10 have revealed an immense diversity of neuronal and glial cell types that underlie the wide array of functional and behavioral traits in the fruit fly. The identities of these cell types are controlled by – still unknown – gene regulatory networks (GRNs), involving combinations of transcription factors that bind to genomic enhancers to regulate their target genes. To characterize the GRN for each cell type in the Drosophila brain, we profiled chromatin accessibility of 240,919 single cells spanning nine developmental timepoints, and integrated this data with single-cell transcriptomes. We identify more than 95,000 regulatory regions that are used in different neuronal cell types, of which around 70,000 are linked to specific developmental trajectories, involving neurogenesis, reprogramming and maturation. For 40 cell types, their uniquely accessible regions could be associated with their expressed transcription factors and downstream target genes, through a combination of motif discovery, network inference techniques, and deep learning. We illustrate how these “enhancer-GRNs” can be used to reveal enhancer architectures leading to a better understanding of neuronal regulatory diversity. Finally, our atlas of regulatory elements can be used to design genetic driver lines for specific cell types at specific timepoints, facilitating the characterization of brain cell types and the manipulation of brain function.