Scalable ensemble information-theoretic co-clustering for massive data

2012 
Co-clustering is effective for simultaneously clustering rows and columns of a data matrix. Yet different co-clustering models usually produce very distinct results. In this paper, we propose a scalable algorithm to co-cluster massive, sparse and high dimensional data and combine individual clustering results to produce a better final result. Our algorithm is particularly suitable for distributed computing environment, which have been revealed in our experiments, and it is implemented on Hadoop platform with MapReduce programming framework in practice. Experimental results on several real and synthetic data sets demonstrated that the proposed algorithm achieved higher accuracy than other clustering algorithms and scale well.(23 refs)
    • Correction
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []