Dynamic Scan Procedure for Detecting Rare-Variant Association Regions in Whole Genome Sequencing Studies

2019 
Whole genome sequencing (WGS) studies are being widely conducted to identify rare variants associated with human diseases and disease-related traits. Classical single-marker association analyses for rare variants have limited power, and variant-set based analyses are commonly used to analyze rare variants. However, existing variant-set based approaches need to pre-specify genetic regions for analysis, and hence are not directly applicable to WGS data due to the large number of intergenic and intron regions that consist of a massive number of non-coding variants. The commonly used sliding window method requires pre-specifying fixed window sizes, which are often unknown as a priori, are difficult to specify in practice and are subject to limitations given genetic association region sizes are likely to vary across the genome and phenotypes. We propose a computationally-efficient and dynamic scan statistic method (Scan the Genome (SCANG)) for analyzing WGS data that flexibly detects the sizes and the locations of rare-variants association regions without the need of specifying a prior fixed window size. The proposed method controls the genome-wise type I error rate and accounts for the linkage disequilibrium among genetic variants. It allows the detected rare variants association region sizes to vary across the genome. Through extensive simulated studies that consider a wide variety of scenarios, we show that SCANG substantially outperforms several alternative rare-variant association detection methods while controlling for the genome-wise type I error rates. We illustrate SCANG by analyzing the WGS lipids data from the Atherosclerosis Risk in Communities (ARIC) study.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    44
    References
    0
    Citations
    NaN
    KQI
    []