Integrating cluster analysis with granular computing for imbalanced data classification problem – A case study on prostate cancer prognosis

2018 
Abstract Analyzing imbalanced dataset is a critical and challenging task in data mining, since it requires special treatment for clusters with different sizes. Imbalance dataset commonly exists in some domains like medical problems. This study intends to propose a classification algorithm based on information granulation (IG) concept for handling imbalanced dataset. The proposed algorithm assembles data from majority classes into granules to balance the class ratio within the data. The proposed algorithm works in two stages. First stage generates a set of IGs using metaheuristics approaches which is a kind of automatic clustering algorithm including dynamic clustering using particle swarm optimization (DCPSO), genetic algorithm K -means (GA K -means), and artificial bee colony K -means (ABC K -means). The next stage applies classification algorithm to classify the data. In this study, the proposed algorithm is verified using both balance and imbalanced benchmark datasets. Simulation results show that the proposed algorithms have promising classification results. Furthermore, this study also applies the proposed algorithms to prostate cancer prognosis classification problem. The algorithm is employed to predict survival rate of prostate cancer patients based on some medical data. The result shows that the proposed algorithms have lower error rate.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    38
    References
    7
    Citations
    NaN
    KQI
    []