Comparison of the C4.5 and a Naive Bayes Classifier for the Prediction of Lung Cancer Survivability

2012 
Numerous data mining techniques have been developed to extract information and identify patterns and predict trends from large data sets. In this study, two classification techniques, the J48 implementation of the C4.5 algorithm and a Naive Bayes classifier are applied to predict lung cancer survivability from an extensive data set with fifteen years of patient records. The purpose of the project is to verify the predictive effectiveness of the two techniques on real, historical data. Besides the performance outcome that renders J48 marginally better than the Naive Bayes technique, there is a detailed description of the data and the required pre-processing activities. The performance results confirm expectations while some of the issues that appeared during experimentation, underscore the value of having domain-specific understanding to leverage any domain-specific characteristics inherent in the data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    34
    Citations
    NaN
    KQI
    []