Efficient Cluster-Based Boosting for Semisupervised Classification

2018 
Semisupervised classification (SSC) consists of using both labeled and unlabeled data to classify unseen instances. Due to the large number of unlabeled data typically available, SSC algorithms must be able to handle large-scale data sets. Recently, various ensemble algorithms have been introduced with improved generalization performance when compared to single classifiers. However, existing ensemble methods are not able to handle typical large-scale data sets. We propose efficient cluster-based boosting (ECB), a multiclass SSC algorithm with cluster-based regularization that avoids generating decision boundaries in high-density regions. A semisupervised selection procedure reduces time and space complexities by selecting only the most informative unlabeled instances for the training of each base learner. We provide evidences to demonstrate that ECB is able to achieve good performance with small amounts of selected data and a relatively small number of base learners. Our experiments confirmed that ECB scales to large data sets while delivering comparable generalization to state-of-the-art methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    26
    References
    5
    Citations
    NaN
    KQI
    []