Probability Density Machine: A New Solution of Class Imbalance Learning

2021 
Class imbalance learning (CIL) is an important branch of machine learning as, in general, it is difficult for classification models to learn from imbalanced data; meanwhile, skewed data distribution frequently exists in various real-world applications. In this paper, we introduce a novel solution of CIL called Probability Density Machine (PDM). First, in the context of Gaussian Naive Bayes (GNB) predictive model, we analyze the reason why imbalanced data distribution makes the performance of predictive model decline in theory and draw a conclusion regarding the impact of class imbalance that is only associated with the prior probability, but does not relate to the conditional probability of training data. Then, in such context, we show the rationality of several traditional CIL techniques. Furthermore, we indicate the drawback of combining GNB with these traditional CIL techniques. Next, profiting from the idea of K-nearest neighbors probability density estimation (KNN-PDE), we propose the PDM which is an improved GNB-based CIL algorithm. Finally, we conduct experiments on lots of class imbalance data sets, and the proposed PDM algorithm shows the promising results.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    49
    References
    0
    Citations
    NaN
    KQI
    []