Accurate, scalable cohort variant calls using DeepVariant and GLnexus

2020 
Population-scale sequenced cohorts are foundational resources for many genetic analyses, but creating them from single-sample variant calls remains challenging. Here we introduce an open-source cohort-calling method that uses the highly accurate germline caller DeepVariant and scalable merging tool GLnexus. Using callset quality metrics based on variant recall and precision in benchmark samples and Mendelian consistency in father-mother-child trios, we optimized the method across a range of cohort sizes, sequencing methods, and sequencing depths. The resulting callsets show consistent quality improvements over those generated using existing best practices. We further evaluated the DeepVariant+GLnexus pipeline in the deeply sequenced 1000 Genomes Project phase 3 samples (1KGP) and show superior callset quality metrics and imputation reference panel performance compared to an independently-generated GATK Best Practices pipeline. We publicly release the 1KGP individual-level variant calls and cohort callset to foster additional development and evaluation of cohort merging methods as well as broad studies of genetic variation.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    50
    References
    12
    Citations
    NaN
    KQI
    []