Model driven performance simulation of cloud provisioned Hadoop mapreduce applications

2016 
Hadoop is a widely adopted open source implementation of MapReduce. A Hadoop cluster can be fully provisioned by a Cloud service provider to provide elasticity in computational resource allocation. Understanding the performance characteristics of a Hadoop job can help achieve an optimal balance between resource usage (cost) and job latency on a cloud-based cluster. This paper presents a method that estimates the performance of a MapReduce application in a Cloud provisioned Hadoop cluster. We develop a model-driven approach that models a cloud provided independent Hadoop MapReduce model and customizes it for a specific Cloud deployment. These models are further transformed into a simulation model that produces estimations of end-to-end job latency. We explore this method in the design space of MapReduce applications to estimate the performance for different sizes of input data. Our approach provides a model-to-simulation-to-prediction method for observing the performance behaviour of MapReduce applications given a configuration of a MapReduce platform.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    8
    References
    3
    Citations
    NaN
    KQI
    []