Developing Checkpointing and Recovery Procedures with the Storage Services of Amazon Web Services.

2020 
In recent years, cloud computing has grown in popularity as they give users easy and almost instantaneous access to different computational resources. Some cloud providers, like Amazon, took advantage of the growing popularity and offered their VMs in some different hiring types: on-demand, reserved, and spot. The last type is usually offered at lower prices but can be terminated by the provider at any time. To deal with those failures, checkpoint and recovery procedures are typically used. In this context, we propose and analyze checkpoint and recovery procedures using three different storage services from Amazon: Amazon Simple Storage Service (S3), Amazon Elastic Block Store (EBS) and Amazon Elastic File System (EFS), considering spot VMs. These procedures were built upon the HADS framework, designed to schedule bag-of-tasks applications to spot and on-demand VMs. Our results showed that EBS outperformed the other approaches in terms of time spent on recording a checkpoint. But it required more time in the recovery procedure. EFS presented checkpointing and recovery times close to EBS but with higher monetary costs than the other services. S3 proved to be the best option in terms of monetary cost but required a longer time for recording a checkpoint, individually. However, when concurrent checkpoints were analysed, which can occur in a real application with lots of tasks, in our tests, S3 outperformed EFS in terms of execution time also.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    11
    References
    4
    Citations
    NaN
    KQI
    []