Chromatin assembled with centromere protein A (CENP-A) is the epigenetic mark of centromere identity. Using new reference models, we now identify sites of CENP-A and histone H3.1 binding within the megabase, α-satellite repeat–containing centromeres of 23 human chromosomes. The overwhelming majority (97%) of α-satellite DNA is found to be assembled with histone H3.1–containing nucleosomes with wrapped DNA termini. In both G1 and G2 cell cycle phases, the 2–4% of α-satellite assembled with CENP-A protects DNA lengths centered on 133 bp, consistent with octameric nucleosomes with DNA unwrapping at entry and exit. CENP-A chromatin is shown to contain equimolar amounts of CENP-A and histones H2A, H2B, and H4, with no H3. Solid-state nanopore analyses show it to be nucleosomal in size. Thus, in contrast to models for hemisomes that briefly transition to octameric nucleosomes at specific cell cycle points or heterotypic nucleosomes containing both CENP-A and histone H3, human CENP-A chromatin complexes are octameric nucleosomes with two molecules of CENP-A at all cell cycle phases.
The reference human genome sequence is inarguably the most important and widely used resource in the fields of human genetics and genomics. It has transformed the conduct of biomedical sciences and brought invaluable benefits to the understanding and improvement of human health. However, the commonly used reference sequence has profound limitations, because across much of its span, it represents the sequence of just one human haplotype. This single, monoploid reference structure presents a critical barrier to representing the broad genomic diversity in the human population. In this review, we discuss the modernization of the reference human genome sequence to a more complete reference of human genomic diversity, known as a human pangenome.
Abstract The repetitive nature and complexity of multiple medically important genes make them intractable to accurate analysis, despite the maturity of short-read sequencing, resulting in a gap in clinical applications of genome sequencing. The Genome in a Bottle Consortium has provided benchmark variant sets, but these excluded some medically relevant genes due to their repetitiveness or polymorphic complexity. In this study, we characterize 273 of these 395 challenging autosomal genes that have multiple implications for medical sequencing. This extended, curated benchmark reports over 17,000 SNVs, 3,600 INDELs, and 200 SVs each for GRCh37 and GRCh38 across HG002. We show that false duplications in either GRCh37 or GRCh38 result in reference-specific, missed variants for short- and long-read technologies in medically important genes including CBS , CRYAA , and KCNE1 . Our proposed solution improves variant recall in these genes from 8% to 100%. This benchmark will significantly improve the comprehensive characterization of these medically relevant genes and guide new method development.
Abstract Mobile elements and highly repetitive genomic regions are potent sources of lineage-specific genomic innovation and fingerprint individual genomes. Comprehensive analyses of large, composite or arrayed repeat elements and those found in more complex regions of the genome require a complete, linear genome assembly. Here we present the first de novo repeat discovery and annotation of a complete human reference genome, T2T-CHM13v1.0. We identified novel satellite arrays, expanded the catalog of variants and families for known repeats and mobile elements, characterized new classes of complex, composite repeats, and provided comprehensive annotations of retroelement transduction events. Utilizing PRO-seq to detect nascent transcription and nanopore sequencing to delineate CpG methylation profiles, we defined the structure of transcriptionally active retroelements in humans, including for the first time those found in centromeres. Together, these data provide expanded insight into the diversity, distribution and evolution of repetitive regions that have shaped the human genome.
In human chromosomes, centromeric regions comprise megabase-size arrays of 171 bp alpha-satellite DNA monomers.The large distances spanned by these arrays preclude their replication from external sites and imply that the repetitive monomers contain replication origins.However, replication within these arrays has not previously been profiled and the role of alpha-satellite DNA in initiation of DNA replication has not yet been demonstrated.Here, replication of alpha-satellite DNA in endogenous human centromeric regions and in de novo formed Human Artificial Chromosome (HAC) was analyzed.We showed that alpha-satellite monomers could function as origins of DNA replication and that replication of alphoid arrays organized into centrochromatin occurred earlier than those organized into heterochromatin.The distribution of inter-origin distances within centromeric alphoid arrays was comparable to the distribution of inter-origin distances on randomly selected non-centromeric chromosomal regions.Depletion of CENP-B, a kinetochore protein that binds directly to a 17 bp CENP-B box motif common to alpha-satellite DNA, resulted in enrichment of alpha-satellite sequences for proteins of the ORC complex, suggesting that CENP-B may have a role in regulating the replication of centromeric regions.Mapping of replication initiation sites in the HAC revealed that replication preferentially initiated in transcriptionally active regions.
Abstract The Human Pangenome Reference Consortium (HPRC) presents a first draft human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals. These assemblies cover more than 99% of the expected sequence and are more than 99% accurate at the structural and base-pair levels. Based on alignments of the assemblies, we generated a draft pangenome that captures known variants and haplotypes, reveals novel alleles at structurally complex loci, and adds 119 million base pairs of euchromatic polymorphic sequence and 1,529 gene duplications relative to the existing reference, GRCh38. Roughly 90 million of the additional base pairs derive from structural variation. Using our draft pangenome to analyze short-read data reduces errors when discovering small variants by 34% and boosts the detected structural variants per haplotype by 104% compared to GRCh38-based workflows, and by 34% compared to using previous diversity sets of genome assemblies.
Article Figures and data Abstract Introduction Results and discussion Materials and methods Data availability References Decision letter Author response Article and author information Metrics Abstract Although originally thought to be silent chromosomal regions, centromeres are instead actively transcribed. However, the behavior and contributions of centromere-derived RNAs have remained unclear. Here, we used single-molecule fluorescence in-situ hybridization (smFISH) to detect alpha-satellite RNA transcripts in intact human cells. We find that alpha-satellite RNA-smFISH foci levels vary across cell lines and over the cell cycle, but do not remain associated with centromeres, displaying localization consistent with other long non-coding RNAs. Alpha-satellite expression occurs through RNA polymerase II-dependent transcription, but does not require established centromere or cell division components. Instead, our work implicates centromere–nucleolar interactions as repressing alpha-satellite expression. The fraction of nucleolar-localized centromeres inversely correlates with alpha-satellite transcripts levels across cell lines and transcript levels increase substantially when the nucleolus is disrupted. The control of alpha-satellite transcripts by centromere-nucleolar contacts provides a mechanism to modulate centromere transcription and chromatin dynamics across diverse cell states and conditions. Introduction Chromosome segregation requires the function of a macromolecular kinetochore structure to connect chromosomal DNA and spindle microtubule polymers. Kinetochores assemble at the centromere region of each chromosome. The position of centromeres is specified epigenetically by the presence of the histone H3-variant, CENP-A, such that specific DNA sequences are neither necessary nor sufficient for centromere function. (McKinley and Cheeseman, 2016). However, despite the lack of strict sequence requirements, centromere regions are typically characterized by repetitive DNA sequences, such as the alpha-satellite repeats found at human centromeres. Understanding centromere function requires knowledge of the centromere-localized protein components, as well as a clear understanding of the nature and dynamics of centromere chromatin. Although originally thought to be silent chromosome regions, centromeres are actively transcribed (Perea-Resa and Blower, 2018). Prior work has detected α-satellite transcription at centromere and pericentromere regions based on the localization of RNA polymerase II (Bergmann et al., 2012; Chan et al., 2012) and the production of centromere RNA transcripts (Chan et al., 2012; Saffery et al., 2003; Wong et al., 2007). Centromere transcription and the resulting RNA transcripts have been proposed to play diverse roles in kinetochore assembly and function (Biscotti et al., 2015; Blower, 2016; Fachinetti et al., 2013; Ferri et al., 2009; Grenfell et al., 2016; Ideue et al., 2014; McNulty et al., 2017; Quénet and Dalal, 2014; Rošić and Erhardt, 2016; Wong et al., 2007). However, due to limitations for analyses of centromere transcripts that average behaviors across populations of cells and based on varying results between different studies, the nature, behavior, and contributions of centromere-derived RNAs have remained incompletely understood. Here, we used single-molecule fluorescence in-situ hybridization (smFISH) to detect alpha-satellite RNA transcripts in individual, intact human cells. Our results define the parameters for the expression and localization of centromere and pericentromere-derived transcripts across a range of conditions. We find that the predominant factor controlling alpha-satellite transcription is the presence of centromere–nucleolar contacts, providing a mechanism to modulate centromere transcription and the underlying chromatin dynamics across diverse cell states and conditions. Results and discussion Quantitative detection of alpha-satellite RNAs by smFISH Prior work analyzed centromere RNA transcripts primarily using population-based assays, such as RT-qPCR and RNA-seq, or detected centromere RNAs in spreads of mitotic chromosomes. To visualize alpha-satellite RNA transcripts in individual intact human cells, we utilized single-molecule fluorescence in-situ hybridization (smFISH), a strategy that has been used to detect mRNAs and cellular long non-coding RNAs (lncRNAs) (Raj et al., 2008). The high sensitivity of smFISH allows for the accurate characterization of number and spatial distribution of RNA transcripts. Alpha-satellite DNA is degenerate such that it can vary substantially between different chromosomes with the presence of higher-order repeats of alpha-satellite variants (Waye and Willard, 1987; Willard and Waye, 1987b). Thus, we first designed targeted probe sets to detect RNAs derived from centromere regions across multiple chromosomes: (1) Sequences complementary to a pan-chromosomal consensus alpha-satellite sequence (labeled as ‘ASAT’), (2) sequences that target supra-chromosomal family 1 (SF1) higher-order arrays, present on chromosomes 1, 3, 5, 6, 7, 10, 12, 16, and 19 (labeled as ‘SF1’) (Alexandrov et al., 2001; Uralsky et al., 2019), and (3) sequences that are enriched for transcripts from the Supra-Chromosomal family three higher-order arrays present on chromosomes 1, 11, 17 and X (labeled as ‘SF3’), with an increased number of targets on chromosome 17 (D17Z1) (Willard and Waye, 1987a). Second, we designed probes that detect sequences enriched on specific chromosomes including the X chromosome (DXZ1, labeled ‘X’; Miga et al., 2014; Willard et al., 1983) and chromosome 7 (D7Z2, labeled as ‘7.2’; Waye et al., 1987). For complete sequence information and an analysis of sequence matches to different chromosomes, see Supplementary files 1 and 2. Alpha-satellite DNA can span megabases of DNA on a chromosome, whereas the active centromere region is predicted to be as small as 100 kb in many cases (McKinley and Cheeseman, 2016). Thus, these smFISH probes will detect RNA transcripts from both the active centromere region and flanking pericentric alpha-satellite DNA. In asynchronously cycling HeLa cells, we detected clear foci using smFISH probe sets for ASAT, SF1, and SF3 (Figure 1A). To ensure that this signal was not due to non-specific hybridization of the RNA probes to genomic DNA, we treated cells with RNase A prior to hybridization. The RNA- FISH signal was diminished substantially after RNase A treatment (Figure 1A,B), confirming the ribonucleic source of the observed signal. As an additional validation of these probes to confirm that they are recognizing alpha-satellite-derived sequences, we used them in a modified procedure to conduct DNA FISH. DNA FISH revealed multiple DNA-associated puncta that were distributed throughout the nucleus in interphase and aligned along the spindle axis on metaphase/anaphase chromosomes (Figure 1—figure supplement 1A), consistent with the behavior of centromere regions. In contrast to the ASAT, SF1, and SF3 probes, we did not detect smFISH foci using oligos designed to recognize transcripts derived from the centromere regions of chromosome seven or the X chromosome (Figure 1—figure supplement 1B). As the absence of signal could reflect a variety of technical features of probe design or a detection limit for the expression level or length of these sequences, we chose not to pursue these probes further. To quantify the number of distinct RNA-FISH foci, we used CellProfiler (Carpenter et al., 2006) to measure the number of foci per nucleus systematically using z-projections of the acquired images. The number of smFISH foci varied between individual cells, but averaged approximately four foci/cell for the ASAT, SF1, and SF3 probe sets in HeLa cells (Figure 1C). Figure 1 with 1 supplement see all Download asset Open asset Quantitative detection of centromere RNAs using smFISH. (A) Detection of alpha-satellite RNA transcripts by smFISH in asynchronous HeLa cells. Designed probes detected RNAs derived from centromeres across subsets of multiple chromosomes, but with distinct specificity (see Supplementary file 2; ASAT, SF1, and SF3 repeats). Treatment of cells with RNase A prior to hybridization diminished RNA-smFISH signals. (B) Quantification of smFISH foci in the presence or absence of RNase A treatment indicates that the signal observed is due to a ribonucleic source. Points represent the number of foci per cell for each cell test. Error bars represent the mean and standard deviation of at least 100 cells. (C) Detection of anti-sense alpha-satellite transcripts in HeLa cells for the ASAT smFISH probe sequences. Error bars represent the mean and standard deviation of at least 100 cells. (D) Images showing varying abundance of alpha-satellite RNA across cell lines (based on smFISH foci), with RPE-1 cells displaying overall lower levels of centromere smFISH foci. For the RPE-1 + p53 KO condition, p53 was eliminated using an established TP53 iKO cell line (McKinley and Cheeseman, 2017). (E) Left, quantification indicating the variation of smFISH foci across selected cell lines. Error bars represent the mean and standard deviation of at least 100 cells. Right, average smFISH foci/cell for multiple independent replicates to enable statistical comparisons. p-values represent T-tests conducted on replicates of smFISH foci numbers for each selected cell line. (F) Graph showing quantification of RT-qPCR for alpha-satellite transcripts from chromosome 21. Levels of chromosome 21 alpha-satellite RNAs was not detected in Rpe1 cells and was therefore set to 0 in the figure. The levels of alpha-satellite transcripts in RPE-1 cells are reduced compared to HeLa cells. A semi-quantitative assessment of the RT-PCR data (with no standard curve interpolation, see Figure 1—figure supplement 1D) indicated a ~ 20-fold reduction in alpha-satellite transcripts in RPE-1 cells relative to HeLa. We performed three biological replicates of the RT-qPCR. Scale bars, 25 µm. Figure 1—source data 1 Source data for the RT-qPCR experiments shown in Figure 1F and Figure 1—figure supplement 1 – panel D. https://cdn.elifesciences.org/articles/59770/elife-59770-fig1-data1-v2.xlsx Download elife-59770-fig1-data1-v2.xlsx Transcription of non-coding RNAs often occurs from both strands of DNA at a given locus. We therefore tested whether we could detect antisense (relative to the ‘sense’ probes used above) alpha-satellite transcripts in Hela cells using smFISH. Indeed, for the ASAT probe sequences, we were able to visualize ~3 foci/cell using antisense smFISH probes, similar to numbers using the sense probe set (four foci/cell) (Figure 1C). Antisense transcription at the centromere has also been previously reported across a variety of species (Carone et al., 2009; Choi et al., 2011; Chueh et al., 2009; Ideue et al., 2014; Koo et al., 2016; Li et al., 2008; May et al., 2005). The level of transcription for centromeric and pericentric satellite DNA has been proposed to vary between developmental stages and tissue types (Maison et al., 2010; Pezer and Ugarković, 2008). In addition, changes in centromere and pericentromere transcription have been observed in cancers (Ting et al., 2011). Therefore, we next sought to analyze differences in smFISH foci across different cell lines using the ASAT and SF1 probe sets. We selected the chromosomally-unstable osteosarcoma cell line U2OS, the breast cancer cell line MCF7, and the immortalized, but non-transformed hTERT-RPE-1 cell line. We found that the levels of alpha-satellite transcripts varied modestly across cell lines (Figure 1D,E; Figure 1—figure supplement 1C), with RPE-1 cells displaying overall lower levels of smFISH foci. As an additional confirmation of these behaviors, we tested the presence of alpha-satellite transcripts by RT-qPCR. Using a previously validated RT-qPCR primer pair against the alpha-satellite array on chromosome 21 (Molina et al., 2016; Nakano et al., 2003), we observed dramatically reduced levels of alpha-satellite transcripts in RPE-1 cells compared to HeLa cells (Figure 1—figure supplement 1D; Figure 1F). To test whether the transformation status of the cell line correlated with the level of smFISH foci, we eliminated the tumor suppressor p53 in RPE-1 cells using our previously-established inducible knockout strategy (McKinley and Cheeseman, 2017). Eliminating p53 did not substantially alter the levels of alpha-satellite smFISH foci in Rpe1 cells (Figure 1D,E) indicating that other factors likely contribute to the observed cellular levels of alpha-satellite RNA transcripts. Together, this strategy provides the ability to quantitatively detect centromere and pericentromere-derived alpha-satellite RNA transcripts using smFISH probes against alpha-satellite sequences and demonstrates that human cell lines display varying levels of alpha-satellite transcripts. Analysis of alpha-satellite transcript localization and cell- cycle control We next sought to assess the localization of alpha-satellite RNA transcripts within a cell. Prior work suggested that non-coding centromere transcripts are produced in cis and remain associated with the centromere from which they are derived, including through associations with centromere proteins (McNulty et al., 2017). Other studies support the action of centromere-derived RNAs in trans (Blower, 2016), but again acting at centromeres. To investigate the distribution of the centromere transcripts, we performed combined immunofluorescence and smFISH to visualize alpha-satellite transcripts relative to centromeres and microtubules. In interphase cells, smFISH foci localized within the nucleus (Figure 2A). Thus, unlike many mRNAs, alpha-satellite-derived RNAs are not exported to the cytoplasm. Although we detected colocalization of alpha-satellite RNAs with a subset of centromeres in HeLa cells, only ~10% of smFISH foci overlapped with centromeres (Figure 2A,B). In mitotic cells, smFISH foci did not associate with chromatin (Figure 2C). Instead, during all stages of mitosis, alpha-satellite RNA transcripts appeared broadly distributed within the cytoplasm. Finally, as the cells exited mitosis into G1, the smFISH foci remained distinct from the chromosomal DNA and were thus excluded from the nucleus when the nuclear envelope reformed (Figure 2D). Similar patterns of cell-cycle dependent localization changes with mitotic exclusion from chromatin have been reported for other cellular long non-coding RNAs (Cabili et al., 2015; Clemson et al., 1996). In contrast to our findings that alpha-satellite transcripts are primarily separable from centromere loci, prior work from others found close associations between alpha-satellite transcripts and centromeres (Blower, 2016; Bobkov et al., 2018; McNulty et al., 2017; Rošić et al., 2014). Based on these different behaviors, we hypothesize that the smFISH approach using the native fixation conditions detects mature alpha-satellite transcripts, but is unable to detect nascent RNAs in the process of transcription. Thus, once transcribed, alpha-satellite non-coding RNAs visualized by smFISH display nuclear localization, but are not tightly associated with the centromere regions from which they are derived. Figure 2 Download asset Open asset Analysis of centromere RNA foci across the cell cycle. (A) Immunofluorescence images (using anti-tubulin antibodies in green and anti-centromere antibodies (ACA) in red) showing alpha-satellite derived transcripts (smFISH; ASAT probe sets) localized to the nucleus during interphase in HeLa cells. The majority of detected transcripts do not co-localize with centromeres. (B) Graph showing the fraction of ASAT smFISH foci that overlap with centromeres by immunofluorescence. Each point represents one cell. n = 36 cells. (C) Immunofluorescence of HeLa cells (as in A) throughout the cell cycle reveals smFISH foci are separable from chromatin in mitosis. (D) Immunofluorescence-smFISH analysis indicates that progression of cells into G1 (defined by cells with a mid-body) results in the nuclear exclusion of smFISH foci. Left: Foci are located in the cytoplasm after the nuclear envelope reforms. Right: Foci are absent, possibly reflecting the degradation of cytoplasmic RNA. (E) Quantification of smFISH foci throughout the cell cycle (for either ASAT or SF1 probe sets) reveals that transcripts levels are high in S/G2 and mitotic cells, but reduced as cells exit mitosis into G1. A T-test was conducted on independent replicates of the ASAT smFISH data for each selected cell-cycle state. Error bars represent the mean and standard deviation of at least 8 cells/replicate. Scale bars, 10 µm. We next analyzed the temporal changes in alpha-satellite transcript numbers during the cell cycle. In contrast to other genomic loci, RNA Polymerase II is present at human and murine centromeres during mitosis (Chan and Wong, 2012; Perea-Resa et al., 2020). In addition, centromere transcription during G1 has been proposed to play a role in CENP-A loading (Bobkov et al., 2018; Chen et al., 2015; Quénet and Dalal, 2014). Recent work measuring the levels of satellite transcripts originating from specific centromeres in human cells suggested the presence of stable RNA levels during the entire cell cycle (McNulty et al., 2017). smFISH provides the capacity to measure the levels of alpha-satellite transcripts in individual cells over the course of the cell cycle. We utilized combined immunofluorescence-smFISH to simultaneously label alpha-satellite RNA transcripts and microtubules, allowing us to distinguish between G1 cells (due to the presence of a mid-body), an S/G2 interphase population, and mitotic cells. In contrast to previous observations, our analysis revealed that the transcripts detected by our smFISH method increased in S/G2 and remained stable throughout mitosis (Figure 2E). We note that a G2/M peak of transcript levels has been reported for murine Minor Satellite transcripts (Ferri et al., 2009). However, as cells exited mitosis into G1, transcripts detected by smFISH were reduced (Figure 2E). We speculate that this may result from the nuclear exclusion of the existing alpha-satellite transcripts, which would make this more susceptible to degradation by cytoplasmic RNAses. Thus, alpha-satellite transcript levels fluctuate over the cell cycle with G1 as a period of low transcript numbers, either indicating reduced transcription during this cell-cycle stage or the increased elimination of alpha-satellite-derived RNA transcripts. Alpha-satellite RNAs are products of Pol II-mediated transcription Previous studies have suggested that centromeres are actively transcribed by RNA polymerase II. RNA polymerase II localizes to centromeres in S. pombe, Drosophila melanogaster, and human cells, including at centromeric chromatin on human artificial chromosomes (HACs) and at neocentromeres (Bergmann et al., 2011; Catania et al., 2015; Chan and Wong, 2012; Chueh et al., 2009; Ferri et al., 2009; Li et al., 2008; Ohkuni and Kitagawa, 2011; Perea-Resa et al., 2020; Quénet and Dalal, 2014; Rošić et al., 2014; Wong et al., 2007). However, it remains possible that additional polymerases contribute to the transcription of alpha-satellite regions. To determine the polymerases that are responsible for generating the alpha-satellite transcripts detected by our smFISH assay, we treated Hela cells with small-molecule inhibitors against all three RNA polymerases. We found a significant reduction in alpha-satellite smFISH foci following inhibition of RNA Polymerase II activity using the small-molecule THZ1 (Figure 3A–C; Figure 3—figure supplement 1A,B), which targets the RNA Pol II activator Cdk7 (Kwiatkowski et al., 2014). In contrast, we did not detect a reduction in smFISH foci following treatment with inhibitors against RNA polymerase I (small-molecule inhibitor BHM-21; Colis et al., 2014) or RNA polymerase III (ML-60218; Wu et al., 2003; Figure 3A–C; Figure 3—figure supplement 1A,B). Instead, as discussed below, we found dramatically increased alpha-satellite smFISH foci following RNA polymerase I inhibition. Consistent with the effects of RNA polymerase I and II inhibition on alpha-satellite transcript levels as detected by smFISH, RT-qPCR analyses indicated substantially decreased chromosome 21 alpha-satellite transcripts following CDK7 inhibition, but increased levels following RNA polymerase I inhibition (Figure 3D). This indicates that the alpha-satellite RNA transcripts detected by smFISH are products of RNA Pol II-mediated transcription. Figure 3 with 1 supplement see all Download asset Open asset Alpha-satellite RNAs are products of Pol II-mediated transcription. (A) Treatment of HeLa cells with small-molecule inhibitors reveals that alpha-satellite transcripts are mediated by RNA polymerase II. Cells were treated with the RNA Polymerase I inhibitor BMH-21 (24 hr), the RNA Polymerase III inhibitor ML-60218 (24 hr), or the Cdk7 inhibitor THZ1 (5 hr), which inhibits RNA Polymerase II initiation. Transcripts were identified using the ASAT smFISH probe set. (B) Quantification of smFISH foci from (A) after treatment of HeLa cells with small-molecule inhibitors against Cdk7, RNA Pol I, and RNA Pol III. smFISH foci were substantially reduced after inhibition of RNA Pol II activator, Cdk7, but increased by RNA Pol I inhibition. Error bars represent the mean and standard deviation of at least 240 cells. (C) Graph showing independent replicates of ASAT smFISH foci for each small-molecule inhibitor treatment (Cdk7, RNA Pol I, and RNA Pol III). P-values represent T-tests for the indicated comparisons. (D) RT-qPCR quantification reveals significantly reduced levels of chromosome 21 alpha-satellite transcripts of cells treated by the Cdk7 inhibitor THZ1 for 5 hr, but increased levels following RNA polymerase I inhibition (24 hr treatment) when compared to control HeLa cells. The levels of alpha-satellite RNA from chromosome 21 detected was outside of our quantifiable range in cells treated with CDK7 inhibitor and thus was set to 0. The mean of 3 biological replicates was plotted and error bars represent the standard deviation. P-value represents the results of a T-test. Figure 3—source data 1 Source data for the RT-qPCR experiments shown in Figure 3D. https://cdn.elifesciences.org/articles/59770/elife-59770-fig3-data1-v2.xlsx Download elife-59770-fig3-data1-v2.xlsx Functional analysis of the protein requirements for alpha-satellite transcripts We next sought to determine the requirements for the production of alpha-satellite transcripts. Centromere DNA functions as a platform for assembly of the kinetochore structure (McKinley and Cheeseman, 2016), an integrated scaffold of protein interactions that mediates the connection between the DNA and microtubules of the mitotic spindle. One possibility to explain the observed transcription of centromere regions, including at neocentromere loci lacking alpha-satellite sequences, is that centromere and kinetochore components act to recruit the RNA Polymerase machinery. To test this, we selectively eliminated diverse centromere and kinetochore components using a panel of CRISPR inducible knockout cell lines expressing dox-inducible Cas9 and guide RNAs (McKinley and Cheeseman, 2017; McKinley et al., 2015). We targeted the centromere-specific H3 variant CENP-A, the CENP-A chaperone HJURP (to block new CENP-A incorporation), the centromere alpha-satellite DNA binding protein CENP-B, the constitutive centromere components CENP-C, CENP-N, and CENP–W, and the outer kinetochore microtubule-binding protein Ndc80. Our prior work has documented the efficacy of each of these inducible knockout cell lines (McKinley and Cheeseman, 2017; McKinley et al., 2015). Consistently, we found that the gene targets were effectively eliminated from centromeres throughout the population for the CENP-A, CENP-B, and CENP-C inducible knockout cell lines (Figure 4—figure supplement 1A; also see McKinley et al., 2015). Eliminating these centromere and kinetochore components did not prevent the presence of alpha-satellite RNA-smFISH foci (Figure 4A). In contrast, the number of foci/cell increased in many of these inducible knockout cell lines, from moderate increases in most knockout cell lines to a substantial increase in CENP-C inducible knockout cells (Figure 4A). This suggests that centromere components are not required for the specific recruitment of RNA Polymerase II to centromere regions, although active centromeres may act to retain RNA Polymerase II during mitosis due to the persistence of sister chromatid cohesion (Perea-Resa et al., 2020). Figure 4 with 1 supplement see all Download asset Open asset Eliminating CENP-C results in substantially increased alpha-satellite transcript numbers. (A) Quantification of smFISH foci (ASAT probe set) after elimination of selected centromere and kinetochore components reveals that centromere components are not required for the production of alpha-satellite transcripts. Inducible knockouts were generated using Cas9 using previously described cell lines (McKinley and Cheeseman, 2017; McKinley et al., 2015). Notably, inducible knockout of CENP-C results in a substantial increase in smFISH foci. Error bars represent the mean and standard deviation of at least 240 cells. (B) Representative images showing the substantial increase in smFISH foci after elimination of the centromere component CENP-C. (C) Quantification of ASAT smFISH foci under the indicated conditions. The increase in alpha-satellite transcripts in cells depleted for CENP-C depends on RNA Polymerase II, as THZ1 treatment (Cdk7 inhibition; 5 hr) resulted in a substantial reduction in smFISH foci in both control cells and CENP-C inducible knockout cells. (D) Quantification of smFISH foci in CENP-C inducible knockout RPE-1 cells reveals that the increase in alpha-satellite transcripts following CENP-C knockout is not specific to HeLa cells. Error bars represent the mean and standard deviation of at least 170 cells. (E) RT-qPCR for alpha-satellite transcripts from chromosome 21 indicates a substantial increase in steady state alpha-satellite RNA levels in HeLa CENP-C inducible knockout cells. The mean of three biological replicates for control and four biological replicates for the CENP-C inducible knockouts was plotted. Error bars represent the standard deviation. P-value represents the results of a T-test. (F) Quantification of smFISH foci number in CENP-C inducible KO cells and Pol I-inhibited (24 hr treatment) cells compared to HeLa cell controls. (G) Quantification of the intensity of individual smFISH foci from the same experiment tested in F showing similar intensities despite the increase in foci number. (H) The half-life of alpha-satellite RNAs derived from chromosome 21 was determined in HeLa and CENP-C inducible knockout cells by RT-qPCR various times following RNA polymerase II inhibition (THZ1 treatment). The level of chromosome 21 alpha-satellite RNA was normalized to GAPDH, a stable mRNA. The half-life of these centromeric transcripts is 78 and 72 min in HeLa and CENP-C inducible knockout cells, respectively. Graph shows mean and standard deviation for two biological replicates. Scale bars, 25 µm. Figure 4—source data 1 Source data for the RT-qPCR experiments shown in Figure 4D and H. https://cdn.elifesciences.org/articles/59770/elife-59770-fig4-data1-v2.xlsx Download elife-59770-fig4-data1-v2.xlsx We also tested the contribution of non-centromere-localized cell division components to alpha-satellite transcription. Because of its DNA-based nature, the centromere is subject to cell-cycle-specific challenges that include chromatin condensation, cohesion, and DNA replication. We thus sought to assess whether disruption of any of these complexes would influence alpha-satellite RNA transcript levels. To do this, we targeted proteins involved in centromere regulation (Sgo1 and BubR1), DNA replication (Mcm6, Gins1, Orc1, and Cdt1), sister chromatid cohesion (ESCO2, Scc1), chromosome condensation (Smc2, CAPG, CAPG2, TOP2A), and nucleosome remodeling (SSRP1). Strikingly, despite the diverse roles of these proteins in different aspects of centromere function, none of these inducible knockouts resulted in reduced levels of ASAT alpha-satellite transcripts as detected by smFISH analysis (Figure 4—figure supplement 1B). Instead, in many cases we detected a modest increase in alpha-satellite smFISH foci in the inducible knockout cells. Overall, our results indicate alpha-satellite transcription does not require the presence of specific DNA binding proteins, DNA structures, or cell division components, and instead that multiple factors act to restrict transcription at centromeres. CENP-C acts to repress alpha-satellite RNA levels Of proteins that we tested, eliminating CENP-C had a particularly substantial effect on the number of smFISH foci (Figure 4A). To confirm this behavior following the loss of CENP-C, we repeated these experiments for both the ASAT and SF1 smFISH probes (Figure 4B,C; Figure 4—figure supplement 1C). In both cases, we observed a strong increase in smFISH foci. To test whether this behavior was specific to HeLa cells, we analyzed the CENP-C inducible knockout in RPE-1 cells. Although there are fewer ASAT smFISH foci in the parental RPE-1 cells, eliminating CENP-C resulted in a strong increase in the number of ASAT smFISH foci (Figure 4D). Moreover, we observed a substantial increase in steady state alpha-satellite RNA levels in HeLa CENP-C inducible knockout cells based on RT-qPCR (Figure 4E). We also note that recent work found that CENP-C overexpression resulted in decreased RNA Polymerase II occupancy at centromere regions (Melters et al., 201
Abstract The publication of the first complete, haploid telomere-to-telomere (T2T) human genome revealed new insights into the structure and function of the heretofore “invisible” parts of the genome including centromeres, tandem repeat arrays, and segmental duplications. Refinement of T2T processes now enables comparative analyses of complete genomes across entire clades to gain a broader understanding of the evolution of chromosome structure and function. The human T2T project involved a unique ad hoc effort involving many researchers and laboratories, serving as a model for collaborative open science. Subsequent generation and analysis of diploid, near T2T assemblies for multiple species represents a substantial increase in scale and would be daunting for any single laboratory. Efforts focused on the primate lineage continue to employ the successful open collaboration strategy and are revealing details of chromosomal evolution, species-specific gene content, and genomic adaptations, which may be general or lineage-specific features. The suborder Ruminantia has a rich history within the field of chromosome biology and includes a broad range of species at varying evolutionary distances with separation of tens of millions of years to subspecies that are still able to interbreed. We propose an open collaborative effort dubbed the “Ruminant T2T Consortium” (RT2T) to generate complete diploid assemblies for species in the Artiodactyla order, focusing on suborder Ruminantia. Here we present the initial near T2T assemblies of cattle, gaur, domestic goat, bighorn sheep, and domestic sheep, and describe the motivation, goals, and proposed comparative analyses to examine chromosomal evolution in the context of natural selection and domestication of species for use as livestock.