A dynamic reconfiguration mechanism to increase the reliability of GPGPUs

2020 
General Purpose Graphic Processing Units (GPGPUs) are effective solutions for high-demanding data processing applications. Recently, they started to be used even in safety-critical applications, such as autonomous car driving systems. GPGPUs are implemented using the latest semiconductor technologies, which are more prone to faults arising during the lifetime operation. However, until now fault mitigation solutions were not extensively included in GPGPUs, due to the limited reliability requirements of the applications they were originally intended for (e.g., gaming or multimedia). This work proposes a dynamically configurable self- repairing mechanism aimed at mitigating the impact of permanent faults in the Scalar Processor (SP) cores in GPGPUs. The mechanism is based on spare modules that can be used to replace faulty SPs when a fault is detected. A configuration instruction allows dynamically controlling in software the selection of the set of active SPs in the SM. The method is extremely flexible since it does not require any change in the application software. Experimental results show that the solution introduces a moderate area overhead while allowing continue working even in the case of any permanent faults affecting the SPs.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    26
    References
    5
    Citations
    NaN
    KQI
    []