High-throughput and Efficient Multilocus Genome-Wide Association Study on Longitudinal Outcomes

2020 
MOTIVATION: With the emerging of high-dimensional genomic data, genetic analysis such as genome-wide association studies (GWAS) have played an important role in identifying disease-related genetic variants and novel treatments. Complex longitudinal phenotypes are commonly collected in medical studies. However, since limited analytical approaches are available for longitudinal traits, these data are often underutilized. In this manuscript, we develop a high-throughput machine learning approach for multilocus GWAS using longitudinal traits by coupling Empirical Bayesian Estimates (EBEs) from mixed-effects modeling with a novel l0-norm algorithm. RESULTS: Extensive simulations demonstrated that the proposed approach not only provided accurate selection of SNPs with comparable or higher power, but also robust control of false positives. More importantly, this novel approach is highly scalable and could be approximately more than 1000 times faster than recently published approaches, making genome-wide multilocus analysis of longitudinal traits possible. In addition, our proposed approach can simultaneously analyze millions of SNPs if the computer memory allows, thereby potentially allowing a true multilocus analysis for high-dimensional genomic data. With application to the data from Alzheimer's Disease Neuroimaging Initiative (ADNI), we confirmed that our approach can identify well-known SNPs associated with AD and were much faster than recently published approaches (>/= 6000 times). AVAILABILITY: The source code and the testing datasets are available at https://github.com/Myuan2019/EBE_APML0. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    29
    References
    0
    Citations
    NaN
    KQI
    []