Soft Error Effects on Arm Microprocessors: Early Estimations vs. Chip Measurements

2021 
Extensive research efforts are being carried out to evaluate and improve the reliability of computing devices, either through beam experiments or simulation-based fault injection. Unfortunately, it is still largely unclear to which extend fault injection can provide an accurate error rate estimation at early stages and if beam experiments can be used to identify the weakest resources in a device. The challenges associated with reliability evaluation grow with the increase of complexity of the hardware and the software. In this paper, we combine and analyze data gathered with extensive beam experiments (on the final physical CPU hardware) and microarchitectural fault injections (on early microarchitectural CPU models). We target a standalone Arm Cortex-A5 and an Arm Cortex-A9 integrated in an SoC and evaluate their reliability in bare-metal and Linux-based configurations. We find that both the SoC integration and the OS presence increase the system DUEs (Detected Unrecoverable Errors) rate (for different reasons) but do not significantly impact the SDCs (Silent Data Corruptions) rate which is solely attributed to the CPU core. Our reliability analysis demonstrates that, even considering SoC integration and OS inclusion, early, pre-silicon microarchitecture-level fault injection delivers accurate SDC rates estimations and lower bounds for the DUE rates.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []