Comparison of pathogenicity prediction tools on somatic variants.

2020 
Genomic sequencing has been increasingly used during the past decade in managing patients with cancer. Interpretation of somatic variants and their pathogenicity is often complex. Pathogenicity prediction tools are commonly used as part of the expert interpretation of somatic variants, but most of these tools were initially developed for germline variants. The aim of this study was to benchmark their performance for somatic variants. A gold standard list was assembled of 4319 somatic single-nucleotide variants, classified as oncogenic (n = 2996) or neutral (n = 1323), based on their presence in curated databases or on their allele frequency in the general population. These variants were annotated with the most commonly used prediction tools [dbNSFP (Database for Nonsynonymous SNPs' Functional Predictions)and UMD-Predictor (University of Minnesota Duluth Predictor)] and computed performance calculations. Stratification of the prediction tools based on Matthews correlation coefficient and area under the receiver operating characteristic curve allowed the identification of the top-performing ones, namely, CADD (Combined Annotation-Dependent Depletion), Eigen or Eigen-PC (Eigen Principal Components), PolyPhen-2 (Polymorphism Phenotyping version 2), PROVEAN (Protein Variation Effect Analyzer), UMD-Predictor, and REVEL (Rare Exome Variant Ensemble Learner). Interestingly, SIFT (Sorting Intolerant From Tolerant), which is a commonly used prediction tool for somatic variants, was ranked in the second performance category. Combining tools two by two only marginally improved performances, mainly because of the occurrence of discordant predictions.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    2
    Citations
    NaN
    KQI
    []