Analysis of parallel computational models for clustering

2018 
Clustering is one of the main task of data mining, where groups of similar objects are discovered and grouping of similar data as well as outliers detection are performed. Processing of huge datasets requires scalable models of computations and distributed computing environments, therefore efficient parallel clustering methods are required for this purpose. Usually for parallel data analytics the MapReduce processing model is used. But growing computer power of heterogeneous platforms based on graphic processors and FPGA accelerators causes that CUDA and OpenCL models may be interesting alternative to MapReduce. This paper presents comparative analysis of effectiveness of applying MapReduce and CUDA/OpenCL processing models for clustering. We compare different methods of clustering in terms of their possibilities of parallelization using both models of computation. The conclusions indicate directions for further work in this area.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    3
    Citations
    NaN
    KQI
    []