Abstract 192: MutAnt: Mutation annotation machine learning algorithm for pathogenicity evaluation of single nonsynonymous nucleotide substitutions in cancer cells

2021 
With the wealth of next-generation sequencing (NGS) produced over the last decade, distinguishing between pathogenic and neutral single nucleotide missense mutations is essential to understanding disease pathogenesis and cancer treatment selection and optimization. Current experimental methods, such as overexpression assays, siRNA, knock-out models, and studies in cultured cells or model organisms, can help uncover the phenotype of a set of mutations in genes but are not optimal for the identification of pathogenic mutations in thousands of potential candidates. In this study, we present a new mutation annotation tool, MutAnt, for the prediction of mutation pathogenicity that achieves superior performance by integrating state-of-the-art machine learning techniques with feature selection and hyperparameter tuning. To develop MutAnt, a supervised machine learning algorithm was trained to classify pathogenic and neutral single nonsynonymous nucleotide substitutions. The training dataset was based on ClinVar v20190102, the CancerGenomeInterpreter database of validated pathogenic mutations, and benign mutations from the database of Cancer Passenger Mutations (dbCPM). The Boruta algorithm with Shapley values was used for feature selection, and the Bayesian optimization was applied for the model hyperparameter tuning. Diverse validation methods were utilized in the development of MutAnt, such as cross-validation; the Database of Curated Mutations (DoCM); holdout on new mutations documented in the latest released version of ClinVar database, which contains validated disease-associated single nucleotide polymorphisms (SNPs) not previously annotated in existing algorithms; mutations with high allele frequency; test dataset III from VariBench; and correlation with the mean function score in a saturation genome editing experiment for the BRCA gene. MutAnt exhibited a stronger performance than other methods on all of the tested validation strategies, which was demonstrated by its accuracy (high f1-score, 0.95 to 0.99) and sensitivity-specificity value (ROC-AUC value, 0.97 to 0.99) in classifying mutations in each validation dataset. Furthermore, the present approach allowed for the prediction of novel mutations that are not covered by extensive population studies with higher accuracy than other methods. This was demonstrated by its prediction of the clinically relevant BRCA variants in the saturation genome editing experiment. In conclusion, MutAnt outperformed all compared mutation annotation algorithms in this study for the classification of clinically relevant benign and pathogenic mutations in the ClinVar dataset. Importantly, this approach allows prediction of novel mutations, subsequently allowing for the selection of targeted therapies for a broader set of patients. Citation Format: Aleksandr Sarachakov, Viktor Svekolkin, Zoia Antysheva, Jessica H. Brown, Alexander Bagaev, Nathan Fowler. MutAnt: Mutation annotation machine learning algorithm for pathogenicity evaluation of single nonsynonymous nucleotide substitutions in cancer cells [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 192.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []