Abstract Accurate and complete gene annotations are indispensable for understanding how genome sequences encode biological functions. For twenty years, the GENCODE consortium has developed reference annotations for the human and mouse genomes, becoming a foundation for biomedical and genomics communities worldwide. Nevertheless, collections of important yet poorly-understood gene classes like long non-coding RNAs (lncRNAs) remain incomplete and scattered across multiple, uncoordinated catalogs, slowing down progress in the field. To address these issues, GENCODE has undertaken the most comprehensive lncRNAs annotation effort to date. This is founded on the manual annotation of full-length targeted long-read sequencing, on matched embryonic and adult tissues, of orthologous regions in human and mouse. Altogether 17,931 novel human genes (140,268 novel transcripts) and 22,784 novel mouse genes (136,169 novel transcripts) have been added to the GENCODE catalog representing a 2-fold and 6-fold increase in transcripts, respectively - the greatest increase since the sequencing of the human genome. Novel gene annotations display evolutionary constraints, have well-formed promoter regions, and link to phenotype-associated genetic variants. They greatly enhance the functional interpretability of the human genome, as they help explain millions of previously-mapped “orphan” omics measurements corresponding to transcription start sites, chromatin modifications and transcription factor binding sites. Crucially, our targeted design assigned human-mouse orthologs at a rate beyond previous studies, tripling the number of human disease-associated lncRNAs with mouse orthologs. The expanded and enhanced GENCODE lncRNA annotations mark a critical step towards deciphering the human and mouse genomes.
Lynch syndrome (LS), characterised by an increased risk for cancer, is mainly caused by germline pathogenic variants affecting a mismatch repair gene (MLH1, MSH2, MSH6, PMS2). Occasionally, LS may be caused by constitutional MLH1 epimutation (CME) characterised by soma-wide methylation of one allele of the MLH1 promoter. Most of these are "primary" epimutations, arising de novo without any apparent underlying cis-genetic cause, and are reversible between generations. We aimed to characterise genetic and gene regulatory changes associated with primary CME to elucidate possible underlying molecular mechanisms.
Mammalian gametogenesis involves dramatic and tightly regulated chromatin remodeling, whose regulatory pathways remain largely unexplored. Here, we generate a comprehensive high-resolution structural and functional atlas of mouse spermatogenesis by combining in situ chromosome conformation capture sequencing (Hi-C), RNA sequencing (RNA-seq), and chromatin immunoprecipitation sequencing (ChIP-seq) of CCCTC-binding factor (CTCF) and meiotic cohesins, coupled with confocal and super-resolution microscopy. Spermatogonia presents well-defined compartment patterns and topological domains. However, chromosome occupancy and compartmentalization are highly re-arranged during prophase I, with cohesins bound to active promoters in DNA loops out of the chromosomal axes. Compartment patterns re-emerge in round spermatids, where cohesin occupancy correlates with transcriptional activity of key developmental genes. The compact sperm genome contains compartments with actively transcribed genes but no fine-scale topological domains, concomitant with the presence of protamines. Overall, we demonstrate how genome-wide cohesin occupancy and transcriptional activity is associated with three-dimensional (3D) remodeling during spermatogenesis, ultimately reprogramming the genome for the next generation.
Abstract Chromosomal fusions represent one of the most common types of chromosomal rearrangements found in nature. Yet, their role in shaping the genomic landscape of recombination and hence genome evolution remains largely unexplored. Here, we take advantage of wild mice populations with chromosomal fusions to evaluate the effect of this type of structural variant on genomic landscapes of recombination and divergence. To this aim, we combined cytological analysis of meiotic crossovers in primary spermatocytes with inferred analysis of recombination rates based on linkage disequilibrium using single nucleotide polymorphisms. Our results suggest the presence of a combined effect of Robertsonian fusions and Prdm9 allelic background, a gene involved in the formation of meiotic double strand breaks and postzygotic reproductive isolation, in reshaping genomic landscapes of recombination. We detected a chromosomal redistribution of meiotic recombination toward telomeric regions in metacentric chromosomes in mice with Robertsonian fusions when compared to nonfused mice. This repatterning was accompanied by increased levels of crossover interference and reduced levels of estimated recombination rates between populations, together with high levels of genomic divergence. Interestingly, we detected that Prdm9 allelic background was a major determinant of recombination rates at the population level, whereas Robertsonian fusions showed limited effects, restricted to centromeric regions of fused chromosomes. Altogether, our results provide new insights into the effect of Robertsonian fusions and Prdm9 background on meiotic recombination.
Abstract Centromeres exert an inhibitory effect on meiotic recombination, but the possible contribution of satellite DNA to this “centromere effect” is under debate. In the horse, satellite DNA is present at all centromeres with the exception of the one from chromosome 11. This organization of centromeres allowed us to investigate the role of satellite DNA on recombination suppression in horse spermatocytes at the stage of pachytene. To this aim we analysed the distribution of the MLH1 protein, marker of recombination foci, relative to CENP-A, marker of centromeric function. We demonstrated that the satellite-less centromere of chromosome 11 causes crossover suppression, similarly to satellite-based centromeres. These results suggest that the centromere effect does not depend on satellite DNA. During this analysis, we observed a peculiar phenomenon: while, as expected, the centromere of the majority of meiotic bivalent chromosomes was labelled with a single immunofluorescence centromeric signal, double-spotted or extended signals were also detected. Their number varied from 0 to 7 in different cells. This observation can be explained by positional variation of the centromeric domain on the two homologs and/or misalignment of pericentromeric satellite DNA arrays during homolog pairing confirming the great plasticity of equine centromeres.
During evolution, new open reading frames (ORFs) with the potential to give rise to novel proteins continuously emerge. A recent compilation of noncanonical ORFs with translation signatures in humans has identified thousands of cases with a putative de novo origin. However, it is not known which is their distribution in the population. Are they universally translated? Here, we use ribosome profiling data from 65 lymphoblastoid cell lines from individuals of Yoruba origin to investigate this question. We identify 2,587 de novo ORFs translated in at least one of the cell lines. In line with their de novo origin, the encoded proteins tend to be smaller than 100 amino acids and encode positively charged proteins. We observe that the de novo ORFs are more polymorphic in the population than the set of canonical proteins, with a substantial fraction of them being translated in only some of the cell lines. Remarkably, this difference remains significant after controlling for differences in the translation levels. These results suggest that variations in the level translation of de novo ORFs could be a relevant source of intraspecies phenotypic diversity in humans.
Abstract Chromosome folding has profound impacts on gene regulation, whose evolutionary consequences are far from being understood. Here we explore the relationship between 3D chromatin remodelling in mouse germ cells and evolutionary changes in genome structure. Using a comprehensive integrative computational analysis, we (i) reconstruct seven ancestral rodent genomes analysing whole-genome sequences of 14 species representatives of the major phylogroups, (ii) detect lineage-specific chromosome rearrangements and (iii) identify the dynamics of the structural and epigenetic properties of evolutionary breakpoint regions (EBRs) throughout mouse spermatogenesis. Our results show that EBRs are devoid of programmed meiotic DNA double-strand breaks (DSBs) and meiotic cohesins in primary spermatocytes, but are associated in post-meiotic cells with sites of DNA damage and functional long-range interaction regions that recapitulate ancestral chromosomal configurations. Overall, we propose a model that integrates evolutionary genome reshuffling with DNA damage response mechanisms and the dynamic spatial genome organisation of germ cells.