logo
    Using ancestry-informative markers to define populations and detect population stratification
    137
    Citation
    45
    Reference
    10
    Related Paper
    Citation Trend
    Abstract:
    A serious problem with case-control studies is that population subdivision, recent admixture and sampling variance can lead to spurious associations between a phenotype and a marker locus, or indeed may mask true associations. This is also a concern in therapeutics since drug response may differ by ethnicity. Population stratification can occur if cases and controls have different frequencies of ethnic groups or in admixed populations, different fractions of ancestry, and when phenotypes of interest such as disease, drug response or drug metabolism, also differ between ethnic groups. Although most genetic variation is inter-individual, there is also significant inter-ethnic variation. The International HapMap Project has provided allele frequencies for approximately three million single nucleotide polymorphisms (SNPs) in Africans, Europeans and East Asians. SNP variation is greatest in Africans. Statistical methods for the detection and correction of population stratification, principally Structured Association and Genomic Control, have recently become freely available. These methods use marker loci spread throughout the genome that are unlinked to the candidate locus to estimate the ancestry of individuals within a sample, and to test for and adjust the ethnic matching of cases and controls. To date, few case-control association studies have incorporated testing for population stratification. This paper will focus on the debate about the quantity and methods for selection of highly informative marker loci required to characterize populations that vary in substructure or the degree of admixture, and will discuss how these theoretically desirable approaches can be effectively put into practice.
    Keywords:
    Population stratification
    Ancestry-informative marker
    International HapMap Project
    Genetic Association
    Genetic genealogy
    Background: Population stratification is the main source of spurious results and poor reproducibility in genetic association findings. Population heterogeneity can be controlled for by grouping individuals in ethnic clusters; however, in admixed populations, there is evidence that such proxies do not provide efficient stratification control. The aim of this study was to evaluate the relation of self-reported with genetic ancestry and the statistical risk of grouping an admixed sample based on self-reported ancestry.Methods: A questionnaire that included an item on self-reported ancestry was completed by 189 female volunteers from an admixed Brazilian population. Individual genetic ancestry was then determined by genotyping ancestry informative markers.Results: Self-reported ancestry was classified as white, intermediate, and black. The mean difference among self-reported groups was significant for European and African, but not Amerindian, genetic ancestry. Pairwise fixation index analysis revealed a significant difference among groups. However, the increase in the chance of type 1 error was estimated to be 14%.Conclusions: Self-reporting of ancestry was not an appropriate methodology to cluster groups in a Brazilian population, due to high variance at the individual level. Ancestry informative markers are more useful for quantitative measurement of biological ancestry.
    Genetic genealogy
    Citations (56)
    Some investigators argue that controlling for self-reported race or ethnicity, either in statistical analysis or in study design, is sufficient to mitigate unwanted influence from population stratification. In this report, we evaluated the effectiveness of a study design involving matching on self-reported ethnicity and race in minimizing bias due to population stratification within an ethnically admixed population in California. We estimated individual genetic ancestry using structured association methods and a panel of ancestry informative markers, and observed no statistically significant difference in distribution of genetic ancestry between cases and controls (P=0.46). Stratification by Hispanic ethnicity showed similar results. We evaluated potential confounding by genetic ancestry after adjustment for race and ethnicity for 1260 candidate gene SNPs, and found no major impact (>10%) on risk estimates. In conclusion, we found no evidence of confounding of genetic risk estimates by population substructure using this matched design. Our study provides strong evidence supporting the race- and ethnicity-matched case-control study design as an effective approach to minimizing systematic bias due to differences in genetic ancestry between cases and controls.
    Population stratification
    Genetic genealogy
    Ancestry-informative marker
    Stratification (seeds)
    Citations (11)
    Abstract Genetic association studies can be used to identify factors that may contribute to disparities in disease evident across different racial and ethnic populations. However, such studies may not account for potential confounding if study populations are genetically heterogeneous. Racial and ethnic classifications have been used as proxies for genetic relatedness. We investigated genetic admixture and developed a questionnaire to explore variables used in constructing racial identity in two cohorts: 50 African Americans and 40 Nigerians. Genetic ancestry was determined by genotyping 107 ancestry informative markers. Ancestry estimates calculated with maximum likelihood estimation were compared with population stratification detected with principal components analysis. Ancestry was approximately 95% west African, 4% European, and 1% Native American in the Nigerian cohort and 83% west African, 15% European, and 2% Native American in the African American cohort. Therefore, self-identification as African American agreed well with inferred west African ancestry. However, the cohorts differed significantly in mean percentage west African and European ancestries (P < 0.0001) and in the variance for individual ancestry (P ≤ 0.01). Among African Americans, no set of questionnaire items effectively estimated degree of west African ancestry, and self-report of a high degree of African ancestry in a three-generation family tree did not accurately predict degree of African ancestry. Our findings suggest that self-reported race and ancestry can predict ancestral clusters but do not reveal the extent of admixture. Genetic classifications of ancestry may provide a more objective and accurate method of defining homogenous populations for the investigation of specific population-disease associations. (Cancer Epidemiol Biomarkers Prev 2008;17(6):1329–38)
    Ancestry-informative marker
    Genetic genealogy
    Population stratification
    Genetic admixture
    Some studies of polymorphisms in prostate cancer (PCa) analyze individuals in a uniform manner, regardless of genetic ancestry.However, PCa aggressiveness differs between subjects of African descent and those of European extraction.Thus, genetic ancestry
    Population stratification
    Ancestry-informative marker
    Genetic genealogy
    Genetic admixture
    White (mutation)
    Citations (6)
    The vitamin D receptor (VDR) is an essential protein related to bone metabolism. Some VDR alleles are differentially distributed among ethnic populations and display variable patterns of linkage disequilibrium (LD). In this study, 200 unrelated Brazilians were genotyped using 21 VDR single nucleotide polymorphisms (SNPs) and 28 ancestry informative markers. The patterns of LD and haplotype distribution were compared among Brazilian and the HapMap populations of African (YRI), European (CEU) and Asian (JPT+CHB) origins. Conditional regression and haplotype-specific analysis were performed using estimates of individual genetic ancestry in Brazilians as a quantitative trait. Similar patterns of LD were observed in the 5' and 3' gene regions. However, the frequency distribution of haplotype blocks varied among populations. Conditional regression analysis identified haplotypes associated with European and Amerindian ancestry, but not with the proportion of African ancestry. Individual ancestry estimates were associated with VDR haplotypes. These findings reinforce the need to correct for population stratification when performing genetic association studies in admixed populations.
    International HapMap Project
    Ancestry-informative marker
    Linkage Disequilibrium
    Population stratification
    Genetic genealogy
    Genetic admixture
    Genetic Association
    The presence of population structure in a sample may confound the search for important genetic loci associated with disease. Our four samples in the Family Investigation of Nephropathy and Diabetes (FIND), European Americans, Mexican Americans, African Americans, and American Indians are part of a genome- wide association study in which population structure might be particularly important. We therefore decided to study in detail one component of this, individual genetic ancestry (IGA). From SNPs present on the Affymetrix 6.0 Human SNP array, we identified 3 sets of ancestry informative markers (AIMs), each maximized for the information in one the three contrasts among ancestral populations: Europeans (HAPMAP, CEU), Africans (HAPMAP, YRI and LWK), and Native Americans (full heritage Pima Indians). We estimate IGA and present an algorithm for their standard errors, compare IGA to principal components, emphasize the importance of balancing information in the ancestry informative markers (AIMs), and test the association of IGA with diabetic nephropathy in the combined sample.A fixed parental allele maximum likelihood algorithm was applied to the FIND to estimate IGA in four samples: 869 American Indians; 1385 African Americans; 1451 Mexican Americans; and 826 European Americans. When the information in the AIMs is unbalanced, the estimates are incorrect with large error. Individual genetic admixture is highly correlated with principle components for capturing population structure. It takes ~700 SNPs to reduce the average standard error of individual admixture below 0.01. When the samples are combined, the resulting population structure creates associations between IGA and diabetic nephropathy.The identified set of AIMs, which include American Indian parental allele frequencies, may be particularly useful for estimating genetic admixture in populations from the Americas. Failure to balance information in maximum likelihood, poly-ancestry models creates biased estimates of individual admixture with large error. This also occurs when estimating IGA using the Bayesian clustering method as implemented in the program STRUCTURE. Odds ratios for the associations of IGA with disease are consistent with what is known about the incidence and prevalence of diabetic nephropathy in these populations.
    International HapMap Project
    Ancestry-informative marker
    Genetic genealogy
    Population stratification
    Genetic Association
    Genetic admixture
    SNP
    Genome-wide Association Study
    Citations (1)
    Ancestry-informative marker
    International HapMap Project
    Population stratification
    Linkage Disequilibrium
    Genetic genealogy
    Genetic Association
    Accurate, high-throughput genotyping allows the fine characterization of genetic ancestry. Here we applied recently developed statistical and computational techniques to the question of African ancestry in African Americans by using data on more than 450,000 single-nucleotide polymorphisms (SNPs) genotyped in 94 Africans of diverse geographic origins included in the HGDP, as well as 136 African Americans and 38 European Americans participating in the Atherosclerotic Disease Vascular Function and Genetic Epidemiology (ADVANCE) study. To focus on African ancestry, we reduced the data to include only those genotypes in each African American determined statistically to be African in origin. From cluster analysis, we found that all the African Americans are admixed in their African components of ancestry, with the majority contributions being from West and West-Central Africa, and only modest variation in these African-ancestry proportions among individuals. Furthermore, by principal components analysis, we found little evidence of genetic structure within the African component of ancestry in African Americans. These results are consistent with historic mating patterns among African Americans that are largely uncorrelated to African ancestral origins, and they cast doubt on the general utility of mtDNA or Y-chromosome markers alone to delineate the full African ancestry of African Americans. Our results also indicate that the genetic architecture of African Americans is distinct from that of Africans, and that the greatest source of potential genetic stratification bias in case-control studies of African Americans derives from the proportion of European ancestry.
    Ancestry-informative marker
    Genetic genealogy
    Genetic admixture
    Population stratification
    Out of africa
    Uncorrelated
    Citations (180)
    Introduction: Human populations are often highly structured due to differences in genetic ancestry among groups, posing difficulties in associating genes with diseases. Ancestry-informative markers (AIMs) aid in the detection of population stratification and provide an alternative approach to map population-specific alleles to disease. Here, we identify and characterize a novel set of African AIMs that separate populations of African ancestry from other global populations including those of European ancestry. Methods: Using data from the 1000 Genomes Project, highly informative SNP markers from five African subpopulations were selected based on estimates of informativeness (In) and compared against the European population to generate a final set of 46,737 African ancestry-informative markers (AIMs). The AIMs identified were validated using an independent set and functionally annotated using tools like SIFT, PolyPhen. They were also investigated for representation of commonly used SNP arrays. Results: This set of African AIMs effectively separates populations of African ancestry from other global populations and further identifies substructure between populations of African ancestry. When a subset of these AIMs was studied in an independent dataset, they differentiated people who self-identify as African American or Black from those who identify their ancestry as primarily European. Most of the AIMs were found to be in their intergenic and intronic regions with only 0.6% in the coding regions of the genome. Most of the commonly used SNP array investigated contained less than 10% of the AIMs. Discussion: While several functional annotations of both coding and non-coding African AIMs are supported by the literature and linked these high-frequency African alleles to diseases in African populations, more effort is needed to map genes to diseases in these genetically diverse subpopulations. The relative dearth of these African AIMs on current genotyping platforms (the array with the highest fraction, llumina's Omni 5, harbors less than a quarter of AIMs), further demonstrates a greater need to better represent historically understudied populations.
    Ancestry-informative marker
    Genetic genealogy
    Population stratification
    1000 Genomes Project
    SNP
    Genome-wide Association Study