A comparative cost analysis of fault-tolerance mechanisms for availability on the cloud

2017 
Abstract As data centres continue to grow in size and complexity in order to respond to the increasing demand for computing resources, failures become the norm instead of an exception. To provide dependability at scale, traditional techniques to tolerate faults focus on reactive, redundant schemes. While the former relies on the checkpointing/restart of a job (which could incur significant overhead in a large-scale system), the latter replicates tasks, thus consuming extra resources to achieve higher reliability and availability of computing environments. Proactive fault-tolerance in large systems represents a new trend to avoid, cope with and recover from failures. However, different fault-tolerance schemes provide different levels of computing environment dependability at diverse costs to both providers and consumers. In this paper, two state-of-the-art fault-tolerance techniques are compared in terms of availability of computing environments to cloud consumers and energy costs to cloud providers. The results show that proactive fault-tolerance techniques outperform traditional redundancies in terms of costs to cloud users while providing available computing environments and services to consumers. However, the computing environment dependability provided by proactive fault-tolerance highly depends on failure prediction accuracy.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    29
    References
    17
    Citations
    NaN
    KQI
    []