Reliable SEU monitoring and recovery using a programmable configuration controller

2017 
FPGAs are promising candidates for computational tasks in space. However, they are susceptible to radiation-induced errors in their configuration memory. The recovery of configuration errors, either by device scrubbing or by module-based recovery, involves a series of reads and writes to the FPGA's configuration port, and is efficiently performed on-chip by a fast, flexible and reliable reconfiguration controller. In this work, we consider the reliability improvement of the recently proposed Programmable Configuration Controller (PCC), a soft reconfiguration controller that has been shown to be both fast and flexible, but whose reliability, particularly in the face of radiation-induced configuration errors, has not until now been studied. To ensure that the PCC itself is reliable, we propose the use of traditional Triple Modular Redundant (TMR) combined with a novel software-based interrupt-driven fault recovery process that leverages hardware-accelerated configuration access. We report on our design space exploration to balance the utilization, error recovery performance, and reliability of the PCC. In extremely harsh radiation environments, the Mean Time to Failure of the PCC is as high as 25 years, compared with 3.5 hours for its non-protected counterpart, and that it takes as little as 27 ms to recover from a configuration memory error affecting the PCC.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    2
    Citations
    NaN
    KQI
    []