Decision Tree Model Based Gene Selection and Classification for Breast Cancer Risk Prediction

2020 
Breast cancer is considered the most frequently diagnosed cancer in worldwide women and ranked second after lung cancer. Early diagnosis of this cancer may increase the chance to get an early treatment, which can increase the chance of survival for women suffering from this disease. Recently, Microarray data technology has brought a great opportunity to make diagnose cancer faster and easy. However, the most common challenge of gene expression data is high dimensionality, i.e., thousands of genes, and a few tens of patients, which makes any prediction approach difficult to apply. To take this challenge, a C5.0 based feature selection approach is being proposed. The strongest point of our approach resides in the combination of two feature selection techniques: the fisher-score based filter method and the inner feature selection ability of C5.0. The classification algorithms used to assess our approach in terms of prediction accuracy are Artificial neural Networks, C5.0 Decision Tree, Logistic Regression, and Support Vector Machine. Compared to the state-of-the-art models, our approach can predict breast cancer with the highest accuracy based on a strict minimum of genes.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    2
    Citations
    NaN
    KQI
    []