Implementing a Platform to Run Clustering Algorithms Using Distributed Computing

2019 
Most of the clustering algorithms are designed to work as a sequential algorithm that requires all data to be present, which limits the actual implementation to run on a single machine and does not support horizontal scalability. This is problematic in today’s context when volume of data gets larger each day and the need to process data quickly is essential. Hence, in this paper we propose a platform that allows running clustering algorithms in a distributed manner. This is achieved through splitting the data into smaller and equal partitions, and through redesigning the original clustering algorithms to allow working on a sub-set of the input data without having to interact with the processing of the rest of the input data. At the end the so-called reduce phase aggregates the partial results obtained from processing each partition and it produces the global result.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    39
    References
    2
    Citations
    NaN
    KQI
    []