A parallel and accurate method for large-scale image segmentation on a cloud environment

2021 
In this paper, we present a parallel algorithm for SLIC on Apache Spark, which we call PSLIC-on-Spark. To this purpose, we have extended the original SLIC algorithm to use the operations in Apache Spark, supporting its parallel processing on multiple executors in the Apache Spark cluster. Then, we analyze the trade-off relationship of PSLIC-on-Spark between its processing speed and accuracy due to partitioning of the original image datasets. Through experiments, we verify the trade-off relationship. Specifically, we show that PSLIC-on-Spark using 8 CPU cores reduces the processing time of SLIC by 2.24–2.93 times while it reduces the boundary recall (BR) of SLIC by 1.54–6.32% and increases under-segmentation error (UE) by 1.79–6.2%. Then, we propose an improved algorithm of PSLIC-on-Spark that improves the accuracy of PSLIC-on-Spark, which we call PASLIC-on-Spark. We employ two important features for PASLIC-on-Spark. It contains two main features: (1) image partitioning considering the shape and position of the clusters rather than a evenly partitioning method and (2) controllable duplication for the boundary between image partitions. Through experiments, we show the accuracy and efficiency of PASLIC-on-Spark on an actual cloud environment configured with 8 worker nodes using Amazon AWS. The experimental results indicate that PASLIC-on-Spark improves the accuracy of PSLIC-on-Spark by 3.66–3.77% of BR and 1.39–1.96% of UE. PASLIC-on-Spark still decreases that of processing time SLIC significantly 1.5–1.67 times on a single-node configuring using 8 CPU cores and 1.18–1.26 times on a cloud environment using 8 computing nodes.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    48
    References
    0
    Citations
    NaN
    KQI
    []