Abstract 1556: Algorithms for discovery of somatic single nucleotide mutation display specific artifacts and different detection capabilities under the effect of read coverage and sample heterogeneity

2017 
Understanding the performance and capability of different bioinformatics algorithms used for the discovery of somatic variants is very important for helping scientists choose an appropriate tool for cancer research. In this study, we developed a comparison approach and created a series of data sets that could be used to provide such guidance. We mixed reads from two well characterized individuals, NA12878 and NA24385, and generated a series of data sets with different coverages and sample heterogeneity. We then used these data sets to evaluate five commonly utilized somatic mutation detection tools. Our results indicate that read coverage has a significant impact on the accuracy and capability of mutation calling by individual bioinformatics algorithms. The mutation caller that performs well with high read coverage may perform poorly with low read coverage. On the other hand, the tool that performed well in calling variants in a relatively higher homogeneity sample may not have the same power to detect rare variants with low mutation allele frequency. In addition, we demonstrated that different mutation calling algorithms are associated with specific artifacts that were sensitive to read coverage. Furthermore, there were large numbers of false positives and false negatives shared by five callers, indicating that other factors, such as read alignment, library preparation, and even the properties of the sequencing platform could be the sources of false discovery for somatic variants. We observed similar behavior of the five variant calling algorithms using the sequencing data of a pair of matched tumor/normal cell lines, confirming the findings from the comparative analyses on the mixture of reads from the two normal individuals. Our findings are expected to facilitate selection of bioinformatics pipelines that fit for specific purposes in cancer research based on sequencing data. Citation Format: Wenming Xiao, Leihong Wu, Gokhan Yavas, Huixiao Hong, Baitang Ning, Weida Tong, Eric F. Donaldson, Zivana Tezak, Reena Philip, Howard Jacob, Louis M. Staudt. Algorithms for discovery of somatic single nucleotide mutation display specific artifacts and different detection capabilities under the effect of read coverage and sample heterogeneity [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2017; 2017 Apr 1-5; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2017;77(13 Suppl):Abstract nr 1556. doi:10.1158/1538-7445.AM2017-1556
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []