Cost-sensitive classification on class-balanced ensembles for imbalanced non-coding RNA data

2016 
Many bioinformatics data sets have class-imbalanced data, where the number of samples in each class is not equal. Since most of data sets contain usual versus unusual cases, e.g. cancer versus normal or miRNAs versus other non-coding RNA, where the minority class with the least number of samples is the interesting class that contains the unusual cases. The learning models based on the standard classifiers, such as the support vector machine (SVM), random forest and k-NN are usually biased towards the majority class, which means that the classifier is most likely to predict the samples from the interesting class inaccurately. Thus, handling class-imbalanced data set has gained the researchers interests recently. A combination of proper feature selection, a cost-sensitive classifier and ensembling based on random forest method (BCE-CSC-RF) is proposed to handle the class-imbalanced data. Random class-balanced ensembles are built individually. Then, each ensemble is used as a training pool to classify the rest of out-bagged samples. Samples in each ensemble will be classified using class-sensitive classifier that incorporates random forest. The sample will be classified by selecting the most often class has been voted-for in all samples appearances in all the formed ensembles. A set of performance measurements including a geometric measurement suggests that the model can improve the classification of the minority class samples.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    4
    Citations
    NaN
    KQI
    []