Reduced-Precision DWC for Mixed-Precision GPUs

2020 
Duplication with Comparison (DWC) is an effective software-level solution to improve the reliability of computing systems, including Graphics Processing Units (GPUs). DWC, however, introduces performance and energy consumption overheads that could be unacceptable for High-Performance Computing (HPC) or real-time safety-critical applications. In this work, we propose Reduced-Precision DWC (RP-DWC): an improvement over the traditional DWC approach that uses mixed-precision GPUs hardware resources to implement fault detection. We investigate, through both fault injection campaigns and accelerated neutron beam experiments, the impact of RPDWC onto performance, energy consumption, and its fault detection capabilites. We show that RP-DWC achieves on average 74% fault coverage (up to 86%) with very small overheads (0.1% time and 24% energy consumption overhead, in the best case).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    0
    Citations
    NaN
    KQI
    []