Constrained Semi-Supervised Growing Self-Organizing Map

2015 
Abstract Semi-supervised clustering tries to surpass the limits of unsupervised clustering using extra information contained in occasional labeled data points. However, providing such labeled samples is not always possible or easy in real world applications. A weaker, yet still very useful option is providing constraints on the unlabeled training samples, which is the focus of the Constrained Semi-Supervised (CSS) clustering. On the other hand, online learning has gained considerable amount of interests in real world problems with massive sample size or streaming behavior, as lack of memory and computational resources seriously restrict the application of the offline and batch methods. However, the existing algorithms for online CSS clustering problem either assumed that the entire dataset is available and added constraints incrementally or considered chunks of constrained data points and applied an offline CSS clustering algorithm. Thus, none of them can be categorized as a genuine online CSS clustering algorithm. In this paper, we propose CS2GS, an online CSS clustering algorithm. CS2GS is constructed by modifying the online learning process of Semi-Supervised Growing Self-Organizing Map, and converting it to an iterative constrained metric learning problem that can be solved using the Bregman׳s iterative projections. The proposed CS2GS is studied via a series of thorough tests using synthetic and real data including selections from UCI datasets and FEP – a recent bilingual corpus used for sentence aligning stage of machine translation. Experimental results show the effectiveness of CS2GS in online CSS clustering, and prove that indeed, the limits of the system accuracy may be pushed higher using unlabeled samples.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    40
    References
    6
    Citations
    NaN
    KQI
    []