Identifying the potential of near data processing for apache spark

Ahsan Javed Awan,Moriyoshi Ohara,Eduard Ayguadé,Kazuaki Ishizaki,Mats Brorsson,Vladimir Vlassov

Identifying the potential of near data processing for apache spark

2017

Ahsan Javed Awan
Moriyoshi Ohara
Eduard Ayguadé
Kazuaki Ishizaki
Mats Brorsson
Vladimir Vlassov

While cluster computing frameworks are continuously evolving to provide real-time data analysis capabilities, Apache Spark has managed to be at the forefront of big data analytics for being a unified framework for both, batch and stream data processing. There is also a renewed interest in Near Data Processing (NDP) due to technological advancement in the last decade. However, it is not known if NDP architectures can improve the performance of big data processing frameworks such as Apache Spark. In this paper, we build the case of NDP architecture comprising programmable logic based hybrid 2D integrated processing-in-memory and in-storage processing for Apache Spark, by extensive profiling of Apache Spark based workloads on Ivy Bridge Server.

Keywords:

Big data
Information and Communications Technology
Ivy Bridge
Programmable logic device
Database
Architecture
Spark (mathematics)
Profiling (computer programming)
Computer science
Computer cluster
Operating system
Data processing

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations