Decentralized K-Means Using Randomized Gossip Protocols for Clustering Large Datasets

Jérôme Fellus,David Picard,Philippe Henri Gosselin

Decentralized K-Means Using Randomized Gossip Protocols for Clustering Large Datasets

2013

Jérôme Fellus
David Picard
Philippe Henri Gosselin

In this paper, we consider the clustering of very large datasets distributed over a network of computational units using a decentralized K-means algorithm. To obtain the same codebook at each node of the network, we use a randomized gossip aggregation protocol where only small messages are exchanged. We theoretically show the equivalence of the algorithm with a centralized K-means, provided a bound on the number of messages each node has to send is met. We provide experiments showing that the consensus is reached for a number of messages consistent with the bound, but also for a smaller number of messages, albeit with a less smooth evolution of the objective function.

Keywords:

Correlation clustering
Machine learning
Gossip protocol
Artificial intelligence
Cluster analysis
Data stream clustering
Codebook
k-means clustering
Canopy clustering algorithm
CURE data clustering algorithm
Computer science
Distributed computing
Theoretical computer science
Gossip

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations