Cross-validation Metrics for Evaluating Classification Performance on Imbalanced Data

2019 
Imbalanced data was often a classification issue, because a training process using the data would make model too suitable for the majority class. Meanwhile, ensemble technique was one alternative to deal with imbalanced data. The paper aimed to compare metrics, measuring classification performance for imbalanced data through an empirical study on cabbage image classification. Metrics used were accuracy, F1 score, g-mean, MCC, Cohen’s Kappa statistics, and AUC. We used three ensembles i.e. bagging, Breiman boosting, and Freund boosting. The empirical study result indicated that accuracy, F1 score, and g-mean gave values not reflecting the actual confusion cases. Accuracy, F1 score, g-mean, MCC, and Kappa showed the same values in different confusion matrix conditions, but AUC gave the different values in different confusion matrix. Based on the result, AUC become the robust metrics to measure on imbalanced condition.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    12
    Citations
    NaN
    KQI
    []