A Framework for Design of Self-Repairing Digital Systems

2019 
This paper introduces a scalable framework for the design of self-testable, self-correcting, and self-repairing digital systems. Modular redundancy and re-programmability are used to accomplish generic self-test and to enable self-repair. Bit error rates (BER) are measured throughout the design to distinguish between transient errors and errors due to semi- or permanent-logic faults. Tri-modular redundancy (TMR) is used for error correction and fault-isolation with a fourth module available for automated repair. Modular reconfiguration (repair) occurs automatically, so that the system continues to operate error-free even during partial dynamic reconfiguration (in FPGAs). The state of a repaired module is re-synchronized with the running system within one cycle after the damaged module is replaced. The framework is capable of simultaneous repair of multiple faults, while ensuring error-free operation. A case study evaluates the reliability improvement of an FPGA-based neural network image classification application.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    0
    Citations
    NaN
    KQI
    []