language-icon Old Web
English
Sign In

Introduction to Imbalanced Data

2019 
An imbalance of sample sizes among class labels makes it difficult to obtain high classification accuracy in many scientific fields, including medical diagnosis, bioinformatics, biology, and fisheries management. This difficulty is referred to as “class imbalance problem” and is considered to be among the 10 most important problems in data mining research. This topic has also been widely discussed in several machine learning workshops. The critical feature of the imbalance problem is that it significantly degrades the performance of standard classification methods, which implicitly assume balanced class distributions and equal costs of misclassification for each class. Hence, new strategies are required for mitigating such imbalances, based on resampling techniques, modification of the classification algorithms, adjustment of weights for class distributions, and so on.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    0
    Citations
    NaN
    KQI
    []