FERNANDO: A Software Transient Fault Tolerance Approach for Embedded Systems Based on Redundant Multi-Threading

2021 
As semiconductor technology scales, modern microprocessors are more vulnerable to transient faults. Software-level fault tolerance schemes are promising because they can improve reliability effectively without extra hardware. Redundant Multi-threading (RMT) uses off-the-shelf cores as redundancy to achieve error resilience. Latest software RMT fault-tolerance models do not effectively cope with transient faults occurring on multiple components during the application execution, resulting in a large number of silent data corruptions (SDC). To address this challenge, we propose FERNANDO, a software-level RMT runtime fault tolerance scheme which provides enhanced error detection and comprehensive error recovery by Triple-Modular Redundancy (TMR). On an ARM Cortex-A57 like simulated microprocessor, we performed probability model transient fault injection experiments in different components of all cores. The results demonstrate that, compared to the state-of-the-art technique, FERNANDO can reduce the SDC rate by about 86.67 percent and optimize the execution time overhead by about 19.64 percent.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    37
    References
    0
    Citations
    NaN
    KQI
    []