Kernel Local Fisher Discriminant Analysis of Principal Components (KLFDAPC) significantly improves the accuracy of predicting geographic origin of individuals

2021 
Geographic patterns of human genetic variation provide important insights into human evolution and disease. A commonly used tool to detect geographic patterns from genetic data is principal components analysis (PCA) or a hybrid linear discriminant analysis of principal components (DAPC). However, genetic features produced from both approaches are only linear combinations of genotypes, which ineluctably miss nonlinear patterns hidden in the genetic variations and could fail to characterize the correct population structure for more complex cases. In this study, we introduce Kernel Local Fisher Discriminant Analysis of Principal Components (KLFDAPC), a nonlinear approach for inferring individual geographic genetic structure that could rectify the limitations of these linear approaches by preserving the nonlinear information and the multimodal space of samples. We tested the power of KLFDAPC to infer population structure and to predict individual geographic origin using simulations and real data sets. Simulation results showed that KLFDAPC significantly improved the population separability compared with PCA and DAPC. The application to POPRES and CONVERGE datasets indicated that the first two reduced features of KLFDAPC correctly recapitulated the geography of individuals and significantly improved the accuracy of predicting individual geographic origin when compared to PCA and DAPC. Therefore, KLFDAPC can be useful for geographic ancestry inference, design of genome scans and correction for spatial stratification in GWAS that link genes to adaptation or disease susceptibility.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    59
    References
    1
    Citations
    NaN
    KQI
    []