An Improved SMOTE Algorithm Using Clustering

Zhao Xiang,Yixin Su,Jian Lan,Diliang Li,Yuying Hu,Zixiao Li

An Improved SMOTE Algorithm Using Clustering

2020

The problem of unbalanced data classification has received widespread attention. Due to the poor classification of minority classes in the dataset, an improved SMOTE using clustering method is proposed in this paper. Firstly, the noise samples in the dataset are identified and removed according to the KNN algorithm; then the minority samples in the dataset are divided into different sub-clusters according to the K-means, and the sample density of the different sub-clusters respectively is calculated, and the sub-clusters with relatively low sample density are given higher sampling weight. Finally, new minority samples are synthesized between the cluster sample of each sub-cluster and the sub-cluster center. Seven groups of KEEL datasets are selected for comparative experiments, and using random forest classifier to classify the oversampling balanced datasets. The F-measure, Recall, and G-mean are selected as evaluation indicators. The improved algorithm has better classification effects.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations