Intelligent RDD Management for High Performance In-Memory Computing in Spark

Mingyue Zhang,Renhai Chen,Xiaowang Zhang,Zhiyong Feng,Guozheng Rao,Xin Wang

Intelligent RDD Management for High Performance In-Memory Computing in Spark

2017

Mingyue Zhang
Renhai Chen
Xiaowang Zhang
Zhiyong Feng
Guozheng Rao
Xin Wang

Spark is a pervasively used in-memory computing framework in the era of big data, and can greatly accelerate the computation speed by wrapping the accessed data as resilient distribution datasets (RDDs) and storing these datasets in the fast accessed main memory. However, the space of main memory is limited, and Spark does not provide an intelligent mechanism to store reasonable RDDs in the limited memory. In this paper, we propose a fine-grained RDD checkpointing and kick-out selection strategy, by which Spark can intelligently select the reasonable RDDs to maximize the memory usage. The experiment is conducted on a server with four nodes. Experimental results demonstrate that the proposed techniques can effectively accelerate the execution speed.

Keywords:

Computation
In-Memory Processing
Big data
World Wide Web
Computer science
Real-time computing
Spark (mathematics)
selection strategy

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations