Sparker: Optimizing Spark for Heterogeneous Clusters

2018 
Spark is an in-memory big data analytics framework which has replaced Hadoop as the de facto standard for processing big data in cloud platforms. These frameworks run on cloud platforms where heterogeneity is a common scenario. Heterogeneity gets introduced due to the failure, addition or upgradation of nodes in the cloud platforms. It can arise from various factors such as variation in the number of CPU cores, amount of memory, disk read/write latencies across the nodes, etc. These factors have a significant impact on the performance of Spark jobs. Spark supports execution of a job on equal-sized executors which can result in under allocation of resources in a heterogeneous cluster. Insufficient resources can severely degrade the performance of CPU and memory intensive applications like machine learning, graph processing, etc. Existing techniques use equal-sized executors which can degrade the performance of jobs in heterogeneous environments. In this paper, we propose Sparker, an efficient resource-aware optimization strategy for Spark in heterogeneous clusters. It overcomes the limitation of heterogeneity in terms of CPU and memory resources by modifying the size of the executor. The executors are re-sized based on the available resources of the node. We have modified Spark source code to incorporate executor re-sizing strategy. Experimental evaluation on SparkBench benchmark shows that our approach achieves a reduction of up to 46% in execution time.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    1
    Citations
    NaN
    KQI
    []