Mixed Data Balancing through Compact Sets Based Instance Selection

Yenny Villuendas-Rey,María Matilde García-Lorenzo

Mixed Data Balancing through Compact Sets Based Instance Selection

2013

Yenny Villuendas-Rey
María Matilde García-Lorenzo

Learning in datasets that suffer from imbalanced class distribution is an important problem in Pattern Recognition. This paper introduces a novel algorithm for data balancing, based on compact set clustering of the majority class. The proposed algorithm is able to deal with mixed, as well as incomplete data, and with arbitrarily dissimilarity functions. Numerical experiments over repository databases show the high quality performance of the method proposed in this paper according to area under the ROC curve and imbalance ratio.

Keywords:

Pattern recognition
Computer science
Cluster analysis
Machine learning
Compact space
Instance selection
Artificial intelligence
Data mining
imbalanced data
quality performance
area under the roc curve
data balancing
majority class

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations