Evaluation of Model Fit of Inferred Admixture Proportions

2019 
Model based methods for genetic clustering of individuals such as those implemented in structure or ADMIXTURE allow to infer individual ancestries and study population structure. The underlying model makes several assumptions about the demographic history that shaped the analyzed genetic data. One assumption is that all individuals are a result of K ancestral homogeneous populations that are all represented well in the data while another assumption is that no drift happened after the admixture event. The histories of many real world populations do not conform to that model, and in that case taking the inferred admixture proportions at face value might be misleading. We propose a method to evaluate the fit of admixture models based on calculating the genotypes predicted by the admixture model, and obtaining the residuals as the difference between the true and predicted genotypes. The correlation of residuals between pairs of individuals can then be used as a measure of model fit. When the model assumptions are not violated and the inferred admixture proportions are accurate then the residuals from a pair of individuals are not correlated. In case of a bad fit, individuals with similar histories have a positive correlation of their residuals. Using simulated and real data, we show how the method is able to detect a bad fit of inferred admixture proportions due to using an insufficient number of clusters K or to demographic histories that deviate significantly from the admixture model assumptions, such as admixture from ghost populations, drift after admixture events and non-discrete ancestral homogeneous populations.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    34
    References
    2
    Citations
    NaN
    KQI
    []