Position-Aware Safe Boundary Interpolation Oversampling

2021 
The class imbalance problem is characterized by the unequal distribution of different class samples, usually resulting in a learning bias toward the majority class. In the past decades, kinds of techniques have been proposed to alleviate this problem. Among those approaches, one promising method, interpolation-based oversampling, proposes to generate synthetic minority samples based on selected reference data, which can effectively solve the skewed distribution of data samples. However, there are several unsolved issues in interpolation-based oversampling. Existing methods often suffer from noisy synthetic samples due to improper data clustering and unsatisfactory reference selection. In this paper, we propose the position-aware safe boundary interpolation oversampling algorithm (PABIO) to address such issues. We firstly introduce a combined clustering algorithm for minority samples to overcome the shortage of clustering methods which are only distance-based or density-based. Then a position-aware interpolation-based oversampling algorithm is proposed for different minority clusters. Especially, we develop a novel method to leverage the majority class information to learn a safe boundary for generating synthetic points. The proposed PABIO is evaluated on multiple imbalanced datasets classified by two base classifiers: support vector machine (SVM) and C4.5 decision tree classifier. Experimental results show that our proposed PABIO outperforms other baselines among benchmark datasets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    32
    References
    0
    Citations
    NaN
    KQI
    []