Reliability and Performance Analysis of Architecture-Based Software Implementing Restarts and Retries Subject to Correlated Component Failures

2015 
High reliability and performance are essential attributes of software systems designed for critical real-time applications. To improve the reliability and performance of software, many systems incorporate some form of fault recovery mechanism. However, contemporary models of software reliability and performance rarely consider these fault recovery mechanisms. Another notable shortcoming of many software models is that they make the simplifying assumption that component failures are statistically independent, which disagrees with several experimental studies that have shown that the failures of software components can exhibit correlation. This paper presents an architecture-based model of software reliability and performance that explicitly considers a two-stage fault recovery mechanism implementing component restarts and application-level retries. The application architecture is characterized by a Discrete Time Markov Chain (DTMC) to represent the dynamic branching behavior of control between the components of the application. Correlations between the component failures are computed with an efficient numerical algorithm for a multivariate Bernoulli (MVB) distribution. We illustrate the utility of the model through a case study of an embedded software application. The results suggest that the model can be used to quantify the impact of software fault recovery and correlated component failures on application reliability and performance.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    37
    References
    1
    Citations
    NaN
    KQI
    []