EHMM-CT : An Online Method for Failure Prediction in Cloud Computing Systems

2016 
The current cloud computing paradigm is still vulnerable to a significant number of system failures. The increasing demand for fault tolerance and resilience in a cost-effective and device-independent manner is a primary reason for creating an effective means to address system dependability and availability concerns. This paper focuses on online failure prediction for cloud computing systems using system runtime data, which is different from traditional tolerance techniques that require an in-depth knowledge of underlying mechanisms. A ‘failure prediction’ approach, based on Cloud Theory (CT) and the Hidden Markov Model (HMM), is proposed that extends the HMM by training with CT. In the approach, the parameter ω is defined as the correlations between various indices and failures, taking into account multiple runtime indices in cloud computing systems. Furthermore, the approach uses multiple dimensions to describe failure prediction in detail by extending parameters of the HMM. The likelihood and membership degree computing algorithms in the CT are used, instead of traditional algorithms in HMM, to reduce computing overhead in the model training phase. Finally, the results from simulations show that the proposed approach provides very accurate results at low computational cost. It can obtain an optimal tradeoff between ‘failure prediction’ performance and computing overhead.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []