Semi-supervised Clustering on Heterogeneous Information Networks
2014
Semi-supervised clustering on information networks combines both the labeled and unlabeled data sets with an aim to improve the clustering performance. However, the existing semi-supervised clustering methods are all designed for homogeneous networks and do not deal with heterogeneous ones. In this work, we propose a semi-supervised clustering approach to analyze heterogeneous information networks, which include multi-typed objects and links and may contain more useful semantic information. The major challenge in the clustering task here is how to handle multi-relations and diverse semantic meanings in heterogeneous networks. In order to deal with this challenge, we introduce the concept of relation-path to measure the similarity between two data objects of the same type. Thereafter, we make use of the labeled information to extract different weights for all relation-paths. Finally, we propose SemiRPClus, a complete framework for semi-supervised learning in heterogeneous networks. Experimental results demonstrate the distinct advantages in effectiveness and efficiency of our framework in comparison with the baseline and some state-of-the-art approaches.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
25
References
29
Citations
NaN
KQI