GABoost: A Clustering Based Undersampling Algorithm for Highly Imbalanced Datasets Using Genetic Algorithm

2019 
Data sets that have imbalanced class distribution is a challenging problem for many application domains. Learning from imbalanced data can’t be done efficiently using current data mining and machine learning tasks. Instead of merely using those algorithms we have to consider some other techniques to learn from those data set. One solution is to develop some preprocessing methods to balance the data sets and combine it with some existing algorithm. In this paper, we propose a new hybrid clustering based undersampling technique using genetic algorithm and AdaBoost, which is called GABoost, for learning from imbalanced data. This algorithm is an attractive alternative for SMOTEBoost, RUSBoost, CUSBoost. Based on the experimental results obtained from 44 imbalanced datasets we strongly recommend GABoost as a striking alternative for improving the performance of the learned classification model which is built using highly imbalanced dataset.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    2
    Citations
    NaN
    KQI
    []