Inferring Population Structure and Admixture Proportions in Low Depth Next-Generation Sequencing Data

2018 
We here present two new methods for inferring population structure and admixture proportions in low depth next-generation sequencing (NGS) data. Inference of population structure is essential in both population genetics and association studies and is often performed using principal component analysis (PCA) or clustering-based approaches. NGS methods provide large amounts of genetic data but are associated with statistical uncertainty for especially low depth sequencing data. Probabilistic methods have therefore been employed to account for this uncertainty by working directly on genotype likelihoods of the unobserved genotypes. We propose a new method for inferring population structure through principal component analysis based on an iterative approach of estimating individual allele frequencies, and demonstrate a greatly improved accuracy in samples with low and variable sequencing depth for both simulated and real datasets. At last, we use the estimated individual allele frequencies in a new fast non-negative matrix factorization method to estimate admixture proportions. Both methods have been implemented in the PCAngsd framework available at http://www.popgen.dk/software/.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    45
    References
    4
    Citations
    NaN
    KQI
    []