Prediction and replication from case-control sequencing studies using custom genotyping and additional sequencing

2013 
We present two results about using allele-count (AC) burdens of rare SNPs discovered in a case-control sequencing study for prediction or validation in an external prospective study. When genotyping only the SNPs polymorphic in the sequence data, the phenotype to AC correlation tends to be larger in the replication data than the primary study. Conversely, if the replication sample is sequenced, ACs of SNPs which are novel in the replication tend to have much smaller or opposite signed associations. We explain this by first deriving the AC-phenotype association implied by a model of diverse SNP effects, and second accounting for the shifted distribution of SNP effects when using a case-control study as a filter for SNP inclusion. In rare diseases, the case population is depleted of protective SNPs and enriched for deleterious SNPs, creating the above difference in AC associations. This phenomenon is most relevant in re-sequencing for risk prediction in rare diseases with heterogeneous rare mutations because it applies to SNPs with MAF near 1 out of the case-control sample size and is exaggerated when SNP log-odds ratios come from a heavy-tailed distribution. It also suggests a ``winner's curse'' in which most risk increasing SNPs at a particular MAF are quickly discovered and future sequencing finds more protective or irrelevant SNPs.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    32
    References
    1
    Citations
    NaN
    KQI
    []