Copy-number variations (CNVs) are a common cause of intellectual disability and/or multiple congenital anomalies (ID/MCA). However, the clinical interpretation of CNVs remains challenging, especially for inherited CNVs. Well-phenotyped patients (5,531) with ID/MCA were screened for rare CNVs using a 250K single-nucleotide polymorphism array platform in order to improve the understanding of the contribution of CNVs to a patients phenotype. We detected 1,663 rare CNVs in 1,388 patients (25.1%; range 0-5 per patient) of which 437 occurred de novo and 638 were inherited. The detected CNVs were analyzed for various characteristics, gene content, and genotype-phenotype correlations. Patients with severe phenotypes, including organ malformations, had more de novo CNVs (P < 0.001), whereas patient groups with milder phenotypes, such as facial dysmorphisms, were enriched for both de novo and inherited CNVs (P < 0.001), indicating that not only de novo but also inherited CNVs can be associated with a clinically relevant phenotype. Moreover, patients with multiple CNVs presented with a more severe phenotype than patients with a single CNV (P < 0.001), pointing to a combinatorial effect of the additional CNVs. In addition, we identified 20 de novo single-gene CNVs that directly indicate novel genes for ID/MCA, including ZFHX4, ANKH, DLG2, MPP7, CEP89, TRIO, ASTN2, and PIK3C3.
Autism Spectrum Disorders (ASD) are highly heritable and characterised by impairments in social interaction and communication, and restricted and repetitive behaviours. Considering four sets of de novo copy number variants (CNVs) identified in 181 individuals with autism and exploiting mouse functional genomics and known protein-protein interactions, we identified a large and significantly interconnected interaction network. This network contains 187 genes affected by CNVs drawn from 45% of the patients we considered and 22 genes previously implicated in ASD, of which 192 form a single interconnected cluster. On average, those patients with copy number changed genes from this network possess changes in 3 network genes, suggesting that epistasis mediated through the network is extensive. Correspondingly, genes that are highly connected within the network, and thus whose copy number change is predicted by the network to be more phenotypically consequential, are significantly enriched among patients that possess only a single ASD-associated network copy number changed gene (p = 0.002). Strikingly, deleted or disrupted genes from the network are significantly enriched in GO-annotated positive regulators (2.3-fold enrichment, corrected p = 2×10−5), whereas duplicated genes are significantly enriched in GO-annotated negative regulators (2.2-fold enrichment, corrected p = 0.005). The direction of copy change is highly informative in the context of the network, providing the means through which perturbations arising from distinct deletions or duplications can yield a common outcome. These findings reveal an extensive ASD-associated molecular network, whose topology indicates ASD-relevant mutational deleteriousness and that mechanistically details how convergent aetiologies can result extensively from CNVs affecting pathways causally implicated in ASD.
Despite the availability of dozens of animal genome sequences, two key questions remain unanswered: First, what fraction of any species' genome confers biological function, and second, are apparent differences in organismal complexity reflected in an objective measure of genomic complexity? Here, we address both questions by applying, across the mammalian phylogeny, an evolutionary model that estimates the amount of functional DNA that is shared between two species' genomes. Our main findings are, first, that as the divergence between mammalian species increases, the predicted amount of pairwise shared functional sequence drops off dramatically. We show by simulations that this is not an artifact of the method, but rather indicates that functional (and mostly noncoding) sequence is turning over at a very high rate. We estimate that between 200 and 300 Mb (∼6.5%–10%) of the human genome is under functional constraint, which includes five to eight times as many constrained noncoding bases than bases that code for protein. In contrast, in D. melanogaster we estimate only 56–66 Mb to be constrained, implying a ratio of noncoding to coding constrained bases of about 2. This suggests that, rather than genome size or protein-coding gene complement, it is the number of functional bases that might best mirror our naïve preconceptions of organismal complexity.
Ten years on from the finishing of the human reference genome sequence, it remains unclear what fraction of the human genome confers function, where this sequence resides, and how much is shared with other mammalian species. When addressing these questions, functional sequence has often been equated with pan-mammalian conserved sequence. However, functional elements that are short-lived, including those contributing to species-specific biology, will not leave a footprint of long-lasting negative selection. Here, we address these issues by identifying and characterising sequence that has been constrained with respect to insertions and deletions for pairs of eutherian genomes over a range of divergences. Within noncoding sequence, we find increasing amounts of mutually constrained sequence as species pairs become more closely related, indicating that noncoding constrained sequence turns over rapidly. We estimate that half of present-day noncoding constrained sequence has been gained or lost in approximately the last 130 million years (half-life in units of divergence time, d1/2 = 0.25–0.31). While enriched with ENCODE biochemical annotations, much of the short-lived constrained sequences we identify are not detected by models optimized for wider pan-mammalian conservation. Constrained DNase 1 hypersensitivity sites, promoters and untranslated regions have been more evolutionarily stable than long noncoding RNA loci which have turned over especially rapidly. By contrast, protein coding sequence has been highly stable, with an estimated half-life of over a billion years (d1/2 = 2.1–5.0). From extrapolations we estimate that 8.2% (7.1–9.2%) of the human genome is presently subject to negative selection and thus is likely to be functional, while only 2.2% has maintained constraint in both human and mouse since these species diverged. These results reveal that the evolutionary history of the human genome has been highly dynamic, particularly for its noncoding yet biologically functional fraction.
Sequencing of the bonobo genome shows that more than three per cent of the human genome is more closely related to either the bonobo genome or the chimpanzee genome than those genomes are to each other. The chimpanzee and the bonobo are our species' two closest living relatives. This paper reports the genome sequence of the bonobo, the last ape to be sequenced. Comparative genomic analyses reveal that more than 3% of the human genome is more closely related to either the bonobo or the chimpanzee genome than these are to each other. The results shed light on the ancestry of the two ape species and might eventually help us to understand the genetic basis of phenotypes that humans share with one or the other ape species. Two African apes are the closest living relatives of humans: the chimpanzee (Pan troglodytes) and the bonobo (Pan paniscus). Although they are similar in many respects, bonobos and chimpanzees differ strikingly in key social and sexual behaviours1,2,3,4, and for some of these traits they show more similarity with humans than with each other. Here we report the sequencing and assembly of the bonobo genome to study its evolutionary relationship with the chimpanzee and human genomes. We find that more than three per cent of the human genome is more closely related to either the bonobo or the chimpanzee genome than these are to each other. These regions allow various aspects of the ancestry of the two ape species to be reconstructed. In addition, many of the regions that overlap genes may eventually help us understand the genetic basis of phenotypes that humans share with one of the two apes to the exclusion of the other.
Abstract 1. Sexual conflict can play an important role in the evolution of animal life‐history characteristics, including lifespan. Seaweed flies show an increase in mortality rates when exposed to brown algae. The seaweed stimulates females to oviposit and males to mount females. Females typically respond to male mounts by performing a violent rejection response. 2. Here the contribution of sexual conflict to the increase in mortality seen in the presence of seaweed was determined. The survival of single and mixed sex pairs of flies was followed in the presence and absence of seaweed. 3. The two sexes showed differential survival rates, with females living longer in the absence of seaweed. The presence of seaweed reduced survival in both sexes. In the presence of seaweed, female survival was lower when paired with a male. Over 40% of the reduction in survival in females in the presence of seaweed appears to be attributable to sexual conflict. 4. The presence of a female did not significantly affect male survival. Thus the mortality cost of being in the presence of the opposite sex and seaweed appears highly asymmetric. 5. In the presence of seaweed, female survival was lower when females were paired with small males. Small males exhibit higher levels of harassment of females, thus it is argued that pre‐copulatory sexual conflict is the probable cause of the increased mortality cost to females of being in the presence of both males and seaweed.
Groupwise functional analysis of gene variants is becoming standard in next-generation sequencing studies. As the function of many genes is unknown and their classification to pathways is scant, functional associations between genes are often inferred from large-scale omics data. Such data types—including protein–protein interactions and gene co-expression networks—are used to examine the interrelations of the implicated genes. Statistical significance is assessed by comparing the interconnectedness of the mutated genes with that of random gene sets. However, interconnectedness can be affected by confounding bias, potentially resulting in false positive findings. We show that genes implicated through de novo sequence variants are biased in their coding-sequence length and longer genes tend to cluster together, which leads to exaggerated p-values in functional studies; we present here an integrative method that addresses these bias. To discern molecular pathways relevant to complex disease, we have inferred functional associations between human genes from diverse data types and assessed them with a novel phenotype-based method. Examining the functional association between de novo gene variants, we control for the heretofore unexplored confounding bias in coding-sequence length. We test different data types and networks and find that the disease-associated genes cluster more significantly in an integrated phenotypic-linkage network than in other gene networks. We present a tool of superior power to identify functional associations among genes mutated in the same disease even after accounting for significant sequencing study bias and demonstrate the suitability of this method to functionally cluster variant genes underlying polygenic disorders.
BackgroundPenguins are flightless aquatic birds widely distributed in the Southern Hemisphere. The distinctive morphological and physiological features of penguins allow them to live an aquatic life, and some of them have successfully adapted to the hostile environments in Antarctica. To study the phylogenetic and population history of penguins and the molecular basis of their adaptations to Antarctica, we sequenced the genomes of the two Antarctic dwelling penguin species, the Adélie penguin [Pygoscelis adeliae] and emperor penguin [Aptenodytes forsteri].
Gorillas are humans' closest living relatives after chimpanzees, and are of comparable importance for the study of human origins and evolution. Here we present the assembly and analysis of a genome sequence for the western lowland gorilla, and compare the whole genomes of all extant great ape genera. We propose a synthesis of genetic and fossil evidence consistent with placing the human–chimpanzee and human–chimpanzee–gorilla speciation events at approximately 6 and 10 million years ago. In 30% of the genome, gorilla is closer to human or chimpanzee than the latter are to each other; this is rarer around coding genes, indicating pervasive selection throughout great ape evolution, and has functional consequences in gene expression. A comparison of protein coding genes reveals approximately 500 genes showing accelerated evolution on each of the gorilla, human and chimpanzee lineages, and evidence for parallel acceleration, particularly of genes involved in hearing. We also compare the western and eastern gorilla species, estimating an average sequence divergence time 1.75 million years ago, but with evidence for more recent genetic exchange and a population bottleneck in the eastern species. The use of the genome sequence in these and future analyses will promote a deeper understanding of great ape biology and evolution. The genome of a western lowland gorilla has been sequenced and analysed, completing the genome sequences of all great ape genera, and providing evidence for parallel accelerated evolution in chimpanzee, gorilla and human lineages at a number of loci. The genome of the gorilla has been sequenced, making it possible to compare the DNA of the four surviving hominid genera: human, chimpanzee, gorilla and orang-utan. The data — mainly from a female western lowland gorilla named Kamilah, but also from other gorillas representing both the western lowland and eastern lowland sub-species — imply that in almost one-third of its genome, the gorilla is closer to the human or chimpanzee than the human and chimp are to each other. Around 500 genes show accelerated evolution in gorilla, human and chimpanzee lineages, and there is evidence for parallel acceleration, particularly in genes associated with hearing. On the basis of genetic and fossil evidence, the authors suggest that the human–chimpanzee and human–chimpanzee–gorilla speciation events occurred at around 6 million and 10 million years ago respectively, whereas the two gorilla species diverged around 1.75 million years ago.