Parallel K-means Clustering Algorithm on NOWs

Sanpawat Kantabutra,Alva L. Couch,Mary Inaba,Naoki Katoh

Parallel K-means Clustering Algorithm on NOWs

1999

Sanpawat Kantabutra
Alva L. Couch
Mary Inaba
Naoki Katoh

Despite its simplicity and its linear time, a serial K-means algorithm's time complexity remains expensive when it is applied to a problem of large size of multidimensional vectors. In this paper we show an improvement by a factor of O(K/2), where K is the number of desired clusters, by applying theories of parallel computing to the algorithm. In addition to time improvement, the parallel version of K-means algorithm also enables the algorithm to run on larger collective memory of multiple machines when the memory of a single machine is insufficient to solve a problem. We show that a problem size can be scaled up to O(K) times a problem size on a single machine.

Keywords:

Parallel computing
Computational geometry
Parallel algorithm
Cluster analysis
Data stream clustering
k-means clustering
Time complexity
Canopy clustering algorithm
CURE data clustering algorithm
Theoretical computer science
Computer science
Cluster (physics)
Algorithm

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations