LXCloudFT: Towards high availability, fault tolerant Cloud system based Linux Containers

2018 
Abstract Infrastructure-as-a-Service container-based virtualization is gaining interest as a platform for running distributed applications. With increasing scale of Cloud architectures, faults are becoming a frequent occurrence, which makes availability a challenge. LXCloudFT is a fault tolerant Cloud system, which is composed of LXCloud-CR, a Checkpoint–Restart model and GC-CR, a garbage collector component that eliminates old snapshots of containers. LXCloudFT is designed, originally, for scientific applications and all its components are decentralized. We want to adapt it to serve stateless loosely coupled applications such as web applications. Replication is a method to survive failures for such applications. This paper addresses the issue of replication and contributes with a novel replication model, LXCloud-Rep, in LXCloudFT. LXCloud-Rep is a replication model with versioning and garbage collection, which is able to replicate Linux Container instances on several nodes in a decentralized manner. Following a node failure, LXCloud-Rep restarts failed containers on a new node from distributed images of containers not from snapshots. It optimizes the use of storage space. Large-scale experiments on Grid’5000 improve the performance of applications.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    7
    Citations
    NaN
    KQI
    []