A cross-platform evaluation of various decision tree algorithms for prognostic analysis of breast cancer data

2016 
Robustness of prediction models is an essential requirement for cancer related diagnostic and prognostic studies. A reliable prognosis of breast cancer is very much dependent on accurate identification of the diagnosed cases. Predictive analytics and learning based methods have shown to provide an effective framework for prognostic studies by accurately classifying data instances into the relevant set of classes based on the severity of the tumor. However a performance validation check is an important analysis to be carried out for benchmarking the best performing variants of a predictive model. This study assesses the relative performance of different variants of a supervised learning algorithm that is used quite commonly to implement a pattern-recognition based model for prognostic assessment of breast cancer data. Principal components analysis performs the pre-processing stage and extracts the most relevant set of features for training different types of decision trees that learn the patterns in the data for classification of new instances. The data of diagnostic cases from the original Wisconsin breast cancer database has been used in the study. Major algorithms under the decision tree family of techniques namely CART and C4.5 have been implemented under different platforms like WEKA, Python and Matlab to evaluate the comparative performance of each other. A major finding has been the low degree of sensitivity of classification accuracy to feature reduction in the case of this data and the same has been investigated and reported.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    6
    Citations
    NaN
    KQI
    []