Performance Evaluation of Big Data Frameworks: MapReduce and Spark

2018 
Spark and MapReduce are two prominent open-source distributed computing frameworks for big data processing and analytics. These frameworks introduce a simple programming APIs for new users and suppress the complication and fault tolerance of distributed tasks. Most of Internet companies widely deploy these frameworks to process their massive data. Furthermore, all other big communities are adopting these HPC because high-performance data analytics is required to solve big data problems. To provide an efficient framework for processing and analyzing large amount of data, today’s researchers correlate both the frameworks. (1) This paper discusses the evaluation of the performance of MapReduce and Spark on page rank, sort and word count. From some existing research, we evaluate page rank and sort algorithms in these frameworks. (2) We provide in-depth analysis of task execution time on word count algorithm in both of these frameworks, through detailed experiment and quantify the performance based on different dataset sizes. Overall experimental results show that Spark is faster than MapReduce. The prime causes of speedups in Spark are the reduced DISK and CPU overheads due to RDD cashing.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    0
    Citations
    NaN
    KQI
    []