An Improved SMOTE Algorithm Using Clustering
2020
The problem of unbalanced data classification has received widespread attention. Due to the poor classification of minority classes in the dataset, an improved SMOTE using clustering method is proposed in this paper. Firstly, the noise samples in the dataset are identified and removed according to the KNN algorithm; then the minority samples in the dataset are divided into different sub-clusters according to the K-means, and the sample density of the different sub-clusters respectively is calculated, and the sub-clusters with relatively low sample density are given higher sampling weight. Finally, new minority samples are synthesized between the cluster sample of each sub-cluster and the sub-cluster center. Seven groups of KEEL datasets are selected for comparative experiments, and using random forest classifier to classify the oversampling balanced datasets. The F-measure, Recall, and G-mean are selected as evaluation indicators. The improved algorithm has better classification effects.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
13
References
0
Citations
NaN
KQI