Abstract 5463: Accuracy improvements in somatic whole-genome small-variant calling with the DRAGEN platform

2020 
Introduction: Next-generation whole-genome sequencing promises to enable dramatic expansion of precision oncology and personalized cancer care. To support this effort, it is critical to develop computational tools that can analyze sequence data accurately and with rapid turn-around-time. One particularly challenging problem is the calling of somatic variants in matched tumor and normal samples. We present the DRAGENTM 3.5 somatic pipeline that performs 110x/40x whole-genome end-to-end analysis in under two hours with accuracy superior to that of all other tools we have benchmarked, including Mutect2/GATK4 and Strelka2. In addition, DRAGEN 3.5 is robust against variations in coverage, sequencing platform, sample preparation chemistry, and tumor purity. It also tolerates tumor-in-normal contamination, thereby making the pipeline applicable to late-stage solid tumors or hematological cancers. Methods: DRAGEN 3.5 replaces the legacy genotyping model (originally developed as part of MuTect2) with that of Strelka2. Whereas the MuTect2 model performs separate analyses on the tumor and normal samples, the Strelka2 model performs a joint analysis, allowing (1) detection of systematic errors that affect both samples simultaneously and (2) modeling of tumor-in-normal contamination, where the amount of contamination in the normal sample depends on the allele frequency observed in the tumor sample at the locus in question. In addition, we improved the probabilistic model of systematic error by incorporating models of strand bias and mismapping. Results: We benchmarked DRAGEN against Mutect2 (included in GATK 4.1.2) and Strelka2 (version 2.9.9) on five public synthetic and real datasets with known truth sets. DRAGEN greatly outperformed the other methods on all five datasets, producing 14-67% and 22-91% fewer false SNV calls, and 35-86% and 48-89% fewer false indel calls than Strelka2 and Mutect2 respectively. DRAGEN also exhibits higher tolerance to tumor-in-normal (TiN) contamination than Strelka2 which is already equipped with a model tolerating TiN contamination. The average end-to-end workflow runtime of the DRAGEN somatic pipeline was 77 minutes, 75% and 830% faster than Strelka2 and Mutect2 taking DRAGEN alignments as input. Conclusion: The DRAGEN pipeline enables reliable whole-genome analysis that can be scaled to large numbers of samples, leading to better tumor characterization and improved interpretation. We anticipate that it will ultimately fuel progress in oncology, cancer research and precision medicine. The DRAGEN 3.5 somatic pipeline can be run either locally on a DRAGEN server or remotely in the cloud via https://basespace.illumina.com. Citation Format: Konrad Scheffler, Sangtae Kim, Varun Jain, Jeffrey Yuan, Westley Sherman, Taylor O9Connell, Eric Ojard, Lisa Murray, Rami Mehio, Severine Catreux. Accuracy improvements in somatic whole-genome small-variant calling with the DRAGEN platform [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 5463.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []