Provisioning data analytic workloads in a cloud

2013 
Data analytics applications are well-suited for a cloud environment. In this paper we examine the problem of provisioning resources in a public cloud to execute data analytic workloads. The goal of our provisioning method is to determine the most cost-effective configuration for a given data analytic workload. Provisioning a workload in a public cloud environment faces several challenges: it is difficult to develop accurate performance prediction models using standard methods; the space of possible configurations is very large so exact solutions cannot be efficiently determined, and the mix and intensity of query classes in a workload vary dynamically over time. We provide a formulation of the provisioning problem and then define a framework to solve the problem. Our framework contains a cost model to predict the cost of executing a workload on a configuration and a method of selecting configurations. The cost model balances resource costs and penalties from SLAs. The specific resource demands and frequencies are accounted for by queueing network models of the Virtual Machines (VMs), which are used to predict performance. We evaluate our approach experimentally using sample data analytic workloads on Amazon EC2.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    66
    Citations
    NaN
    KQI
    []