Density-based clustering with side information and active learning

2017 
Data clustering is one of the most important tasks in machine learning and data mining, which aims to discover structure and the relational between observations inside data sets. In many situations, side information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known (called seeds), or certain observations may be known to belong (or not) to the same cluster (pairwise constraints). Many semi-supervised clustering algorithms are presented in literature to improve the clustering accuracy by effectively exploring these available side information. However, each algorithm usually uses one kind of side information. In this paper, we propose a new semi-supervised density based clustering which integrates both kinds of side information, and embeds an active learning strategy in the process of finding clusters, named MCSSDBS. Experiments conducted on real data sets from UCI show the effectiveness of our algorithm compared with the semi-supervised density-based clustering (SSDBSCAN).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    0
    Citations
    NaN
    KQI
    []