A platform for case-control matching enables association studies without genotype sharing

2018 
Acquiring a sufficiently powered cohort of control samples can be time consuming or, sometimes, impossible. Accordingly, an ability to leverage control samples that were already collected and sequenced elsewhere could dramatically improve power in all genetic association studies. However, since majority of the genotyped and sequenced human DNA samples to date are subject to strict data sharing regulations, large-scale sharing of, in particular, control samples is extremely challenging. Using insights from image recognition, we developed a method allowing selection of the best-matching controls in an external pool of samples that is compliant with personal genotype data protection restrictions. Our approach uses singular value decomposition of the matrix of case genotypes to rank controls in another study by similarity to cases. We demonstrate that this recovers an accurate case-control association analysis for both ultra-rare and common variants and implement and provide online access to a library of ~17,000 controls that enables association studies for case cohorts lacking control subjects.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    1
    Citations
    NaN
    KQI
    []