A Gene-disease-based Machine Learning Approach to Identify Prostate Cancer Biomarkers

2019 
Identifying biomarkers that can be used to classify certain disease stages, or identify when a disease becomes more aggressive is one of the most important applications of machine learning. Traditional biomarker identification approaches, typically, use machine learning techniques to identify a number of genes and macromolecules as biomarkers that can be used to diagnose specific diseases or states of diseases with very high accuracy, using molecular measurements such as mutations, gene expression, copy number variations, and others. However, Experts' opinions and knowledge is required to validate such findings. We propose a new machine learning model that incorporates a knowledge-based system used to integrate the findings of the DisGeNET database which is a framework that provides proven relationships among diseases and genes. The machine learning pipeline starts by reducing the number of features using a filter based feature selection method. The DisGeNET database is used to score each gene relating to the given cancer name. Then a wrapper-based feature-selection method picks the best set of genes with the highest classification accuracy. The method returned key genes from multiple data sets that classify with high accuracy while being biologically relevant, and no human intervention needed. Initial results provide a high area under the curve with a handful of genes that are already proven to be related to the relevant disease and state based on the latest published medical findings.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    31
    References
    1
    Citations
    NaN
    KQI
    []