Intelligent RDD Management for High Performance In-Memory Computing in Spark

2017 
Spark is a pervasively used in-memory computing framework in the era of big data, and can greatly accelerate the computation speed by wrapping the accessed data as resilient distribution datasets (RDDs) and storing these datasets in the fast accessed main memory. However, the space of main memory is limited, and Spark does not provide an intelligent mechanism to store reasonable RDDs in the limited memory. In this paper, we propose a fine-grained RDD checkpointing and kick-out selection strategy, by which Spark can intelligently select the reasonable RDDs to maximize the memory usage. The experiment is conducted on a server with four nodes. Experimental results demonstrate that the proposed techniques can effectively accelerate the execution speed.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    2
    References
    5
    Citations
    NaN
    KQI
    []