Live-Out Register Fencing: Interrupt-Triggered Soft Error Correction Based on the Elimination of Register-to-Register Communication
2016
This article introduces Live-Out Register Fencing (LoRF), a soft error correction mechanism that uses the novel Spill Register File as a container of checkpointing data. LoRF’s Spill Register File holds the values shared among basic blocks in the program, and, coupled with a new compilation strategy, LoRF allows for error correction in the same basic block where the error was detected. In LoRF, error correction is triggered by a hardware interrupt that restores the registers of a basic block from the Spill Register File. After these registers are restored, the basic block where the error was detected can just be re-executed, thus reducing the costs of error recovery. LoRF’s error correction policy eliminates the need for expensive architectural support for checkpointing and rollback, reducing the performance overhead of online soft error correction. LoRF relies on both a modified processor architecture and a corresponding compiler. The architecture was implemented in synthesizable VHDL, whereas the compiler was developed as an extension of the LLVM framework. Fault injection experiments support an error correction coverage of 99.35p and a mean performance overhead of 1.33 for the entire life cycle of an error from its occurrence to its elimination from the system.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
29
References
1
Citations
NaN
KQI