Children of consanguineous unions carry long runs of homozygosity (ROH) in their genomes, due to their parents' recent shared ancestry. This increases the burden of recessive disease in populations with high levels of consanguinity and has been heavily studied in some groups. However, there has been little investigation of the broader effect of consanguinity on patterns of genetic variation on a global scale. This study, which collected published genetic data and information about marriage practice from 395 worldwide populations, shows that reported preference for cousin marriage has a detectable association with the distribution of long ROH in this sample, increasing the expected number of ROH longer than 10 cM by a factor of 2.2. Variation in marriage practice and consequent rates of consanguinity are therefore an important aspect of demographic history for the purposes of modeling human genetic variation. However, reported marriage practices explain a relatively small proportion of the variation in ROH distribution, and consequently, population genetic data are only partially informative about cultural preferences.
Abstract Mitochondrial DNA copy number (mtCN) is often treated as a proxy for mitochondrial (dys)function and disease risk. Pathological changes in mtCN are common symptoms of rare mitochondrial disorders but reported associations between mtCN and common diseases vary considerably across studies. We sought to understand the biology of mtCN by carrying out genome and phenome-wide association studies of mtCN in 30,666 individuals from the Penn Medicine BioBank—a large, diverse cohort of largely African and European ancestry. We estimated mtCN in peripheral blood using exome sequence data, taking into account the effects of blood cell composition, particularly neutrophil and platelet counts. We replicated known genetic associations of mtCN in the PMBB and found that their effect sizes are highly correlated between individuals of European and African ancestry. However, the heritability of mtCN was much higher among individuals of largely African ancestry ( h 2 = 0.3) compared to European ancestry individuals ( h 2 = 0.1). Admixture mapping suggests that there are undiscovered variants underlying mtCN that are differentiated in frequency between individuals with African and European ancestry. We further show that mtCN is associated with many health-related phenotypes. We discovered robust associations between mtDNA copy number and diseases of metabolically active tissues, such as cardiovascular disease and liver damage that were consistent across African and European ancestry individuals. Other associations, such as epilepsy, prostate cancer, and disorders of iron metabolism were only discovered in either individuals with European or African ancestry, but not both. Even though we replicate known genetic and phenotypic associations of mtCN, we demonstrate that they are sensitive to blood cell composition and environmental modifiers, explaining why such associations are inconsistent across studies. Peripheral blood mtCN might therefore be used as a biomarker of mitochondrial dysfunction and disease risk, but such associations must be interpreted with care.
Abstract Time series data of allele frequencies are a powerful resource for detecting and classifying natural and artificial selection. Ancient DNA now allows us to observe these trajectories in natural populations of long-lived species such as humans. Here, we develop a hidden Markov model to infer selection coefficients that vary over time. We show through simulations that our approach can accurately estimate both selection coefficients and the timing of changes in selection. Finally, we analyze some of the strongest signals of selection in the human genome using ancient DNA. We show that the European lactase persistence mutation was selected over the past 5,000 years with a selection coefficient of 2-2.5% in Britain, Central Europe and Iberia, but not Italy. In northern East Asia, selection at the ADH1B locus associated with alcohol metabolism intensified around 4,000 years ago, approximately coinciding with the introduction of rice-based agriculture. Finally, a derived allele at the FADS locus was selected in parallel in both Europe and East Asia, as previously hypothesized. Our approach is broadly applicable to both natural and experimental evolution data and shows how time series data can be used to resolve fine-scale details of selection.
Abstract Children of consanguineous unions carry long runs of homozygosity (ROH) in their genomes, due to their parents’ recent shared ancestry. This increases the burden of recessive disease in populations with high levels of consanguinity and has been heavily studied in some groups. However, there has been little investigation of the broader effect of consanguinity on patterns of genetic variation on a global scale. Here, we collect published genetic data and information about marriage practices from 396 worldwide populations and show that preference for cousin marriage has a detectable effect on the distribution of long ROH in these samples, increasing the expected number of ROH longer than 10Mb by a factor of 1.5 (P=2.3 × 10 −4 ). Variation in marriage practice and consequent rates of consanguinity is therefore an important aspect of demographic history for the purposes of modeling human genetic variation. However, marriage practices explain a relatively small proportion of the variation in ROH distribution and consequently the ability to predict marriage practices from population genetic samples (for example of ancient populations) is limited.
Abstract Most variants identified in human genome-wide association studies and scans for selection are non-coding. Interpretation of these variants’ effects and understanding of the way in which they contribute to phenotypic variation and adaptation in human populations is therefore limited by our understanding of gene regulation and by the difficulty in confidently linking non-coding variants to genes. To overcome this, we developed a gene-by-gene test for population-specific selection based on combinations of regulatory variants. We extended the Q X test for polygenic selection to test for selection on regulatory variants for 17,388 protein-coding genes across 2,504 individuals. We identified 45 genes with significant evidence (FDR <0.1) for selection, including FADS1 , KHK , SULT1A2 , ITGAM , and genes in the HLA region. We further confirm that significant selection signals do correspond to plausible population-level differences in predicted expression. However, we find that very few (0.2%) genes have strong evidence for directional, population-specific selection on the component of their expression that is predicted by cis -regulatory variants. While this is consistent with most cis -regulatory variation evolving under genetic drift or stabilizing selection, it is also possible that any effects are smaller than we can detect, or that population-specific selection is driven by tissue-specific or trans effects. Our gene-level Q X score is independent of other methods for detecting selection based on genomic variation, may therefore be useful when used in combination with more traditional selection tests to specifically identify selection on regulatory variation. Overall, our results demonstrate the utility of one approach to combining population-level information with functional data to understand the evolution of gene expression.
Abstract The FADS locus contains the genes FADS1 and FADS2 that encode enzymes involved in the synthesis of long-chain polyunsaturated fatty acids. This locus appears to have been a repeated target of selection in human evolution, likely because dietary input of long-chain polyunsaturated fatty acids varied over time depending on environment and subsistence strategy. Several recent studies have identified selection at the FADS locus in Native American populations, interpreted as evidence for adaptation during or subsequent to the passage through Beringia. Here, we show that these signals are confounded by independent selection—postdating the split from Native Americans—in the European and, possibly, the East Asian populations used in the population branch statistic test. This is supported by direct evidence from ancient DNA that one of the putatively selected haplotypes was already common in Northern Eurasia at the time of the separation of Native American ancestors. An explanation for the present-day distribution of the haplotype that is more consistent with the data is that Native Americans retain the ancestral state of Paleolithic Eurasians. Another haplotype at the locus may reflect a secondary selection signal, although its functional impact is unknown.
CHC22 clathrin plays a key role in intracellular membrane trafficking of the insulin-responsive glucose transporter GLUT4, and so in post-prandial clearance of glucose from human blood. We performed population genetic and phylogenetic analyses of the CLTCL1 gene, encoding CHC22, to gain insight into its functional evolution. Analysis of 58 vertebrate genomes showed independent loss of CLTCL1 in at least two lineages after it arose from a gene duplication during the emergence of jawed vertebrates. All vertebrates studied retain the parent CLTC gene encoding CHC17 clathrin, which mediates endocytosis and other housekeeping pathways of membrane traffic, as performed by the single type of clathrin in non-vertebrate eukaryotes. For those species retaining CLTCL1, preservation of CHC22 functionality was supported by strong evidence for purifying selection over phylogenetic timescales, as seen for CLTC. Nonetheless, CLTCL1 showed considerably greater allelic diversity than CLTC in humans and chimpanzees. In all human population samples studied, two allelic variants of CLTCL1 segregate at high frequency, encoding CHC22 proteins with either methionine or valine at position 1316. Balancing selection of these two allotypes is inferred, with V1316 being more frequent in farming populations, when compared to hunter-gatherers, and originating an estimated 500-50 thousand years ago. Functional studies indicate that V1316-CHC22 is less effective at controlling GLUT4 membrane traffic than M1316-CHC22, leading to an attenuated insulin-regulated response, consistent with structural predictions and measurable differences in cellular dynamics of the two variants. These analyses suggest that CHC22 clathrin was subject to selection in humans with different diets, leading to allotypes that affect its role in nutrient metabolism and have potential to differentially influence the human insulin response.
Abstract Archaeogenetic studies have described the formation of Eurasian ‘steppe ancestry’ as a mixture of Eastern and Caucasus hunter-gatherers. However, it remains unclear when and where this ancestry arose and whether it was related to a horizon of cultural innovations in the 4 th millennium BCE that subsequently facilitated the advance of pastoral societies likely linked to the dispersal of Indo-European languages. To address this, we generated genome-wide SNP data from 45 prehistoric individuals along a 3000-year temporal transect in the North Caucasus. We observe a genetic separation between the groups of the Caucasus and those of the adjacent steppe. The Caucasus groups are genetically similar to contemporaneous populations south of it, suggesting that – unlike today – the Caucasus acted as a bridge rather than an insurmountable barrier to human movement. The steppe groups from Yamnaya and subsequent pastoralist cultures show evidence for previously undetected farmer-related ancestry from different contact zones, while Steppe Maykop individuals harbour additional Upper Palaeolithic Siberian and Native American related ancestry.