Gene pathogenicity prediction of Mendelian diseases via the Random Forest algorithm

2019 
The study of Mendelian diseases and the identification of their causative genes are of great significance in the field of genetics. The evaluation of the pathogenicity of genes and the total number of Mendelian disease genes are both important questions worth studying. However, very few studies have addressed these issues to date, so we attempt to answer them in this study. We calculated gene pathogenicity prediction (GPP) score by a machine learning approach (random forest algorithm) to evaluate the pathogenicity of genes. When we applied the GPP score to the testing gene set, we obtained accuracy of 80%, recall of 93% and area under the curve (AUC) of 0.87. Our results estimated that a total of 10,399 protein-coding genes were Mendelian disease genes. Furthermore, we found the GPP score was positively correlated with the severity of disease. Our results indicate that GPP score may provide a robust and reliable guideline to predict the pathogenicity of protein-coding genes. To our knowledge, this is the first trial to estimate the total number of Mendelian disease genes.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    0
    Citations
    NaN
    KQI
    []