Synthetic minority image over-sampling technique: How to improve AUC for glioblastoma patient survival prediction

2017 
Real-world datasets are often imbalanced, with an important class having many fewer examples than other classes. In medical data, normal examples typically greatly outnumber disease examples. A classifier learned from imbalanced data, will tend to be very good at the predicting examples in the larger (normal) class, yet the smaller (disease) class is typically of more interest. Imbalance is dealt with at the feature vector level (create synthetic feature vectors or discard some examples from the larger class) or by assigning differential costs to errors. Here, we introduce a novel method for over-sampling minority class examples at the image level, rather than the feature vector level. Our method was applied to the problem of Glioblastoma patient survival group prediction. Synthetic minority class examples were created by adding Gaussian noise to original medical images from the minority class. Uniform local binary patterns (LBP) histogram features were then extracted from the original and synthetic image examples with a random forests classifier. Experimental results show the new method (Image SMOTE) increased minority class predictive accuracy and also the AUC (area under the receiver operating characteristic curve), compared to using the imbalanced dataset directly or to creating synthetic feature vectors.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    8
    Citations
    NaN
    KQI
    []