Joint exploring of risky labeled and unlabeled samples for safe semi-supervised clustering

2021 
Abstract In the past few years, Safe Semi-Supervised Learning (S3L) has become an emerging research topic. A few studies have been investigated in the S3L field and obtained desired performance. However, these studies mainly focus on classification problems which cause less attention on clustering. Meanwhile, there is no study takes both risky labeled and unlabeled samples into consideration(e.g., mislabeled samples and outliers). Therefore, we propose a novel Safe Semi-Supervised clustering method to safely explore the labeled and unlabeled samples. Firstly, we apply an effective approach to compute Safe Degree (SD) by estimating local density and minimum distance of each labeled and unlabeled sample. If a sample has large local density and small minimum distance, it can be safe, and correspondingly SD should be high. Otherwise, the sample should be risky and SD is low. Then the SD is introduced into a model-based semi-supervised clustering method to reduce the negative influences of risky labeled and unlabeled samples. Additionally, we construct a graph-based regularization term to limit the outputs of risky labeled samples to be those of nearest unlabeled neighbors. In this case, it is expected to further reduce the harm of risky labeled samples. At the same time, an illustration on an artificial dataset is given to explain the usefulness of the defined SD. Finally, the results which conducted on ten UCI datasets show that our algorithm is effective enough to achieve good clustering performance
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    33
    References
    0
    Citations
    NaN
    KQI
    []