An Improved SMOTE Algorithm Using Clustering

2020 
The problem of unbalanced data classification has received widespread attention. Due to the poor classification of minority classes in the dataset, an improved SMOTE using clustering method is proposed in this paper. Firstly, the noise samples in the dataset are identified and removed according to the KNN algorithm; then the minority samples in the dataset are divided into different sub-clusters according to the K-means, and the sample density of the different sub-clusters respectively is calculated, and the sub-clusters with relatively low sample density are given higher sampling weight. Finally, new minority samples are synthesized between the cluster sample of each sub-cluster and the sub-cluster center. Seven groups of KEEL datasets are selected for comparative experiments, and using random forest classifier to classify the oversampling balanced datasets. The F-measure, Recall, and G-mean are selected as evaluation indicators. The improved algorithm has better classification effects.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    0
    Citations
    NaN
    KQI
    []