LFTSM: Lightweight and Fully Testable SEU Mitigation System for Xilinx Processor-Based SoCs

2020 
Field-Programmable Gate Arrays (FPGAs) provide a cutting-edge platform for meeting the performance, cost, dependability, and flexibility requirements of on-board data processing in mission-critical and safety applications. However, commercial off-the-shelf SRAM-based FPGAs are susceptible to radiation-induced Single Event Upsets (SEUs). The detection and mitigation of SEUs is, therefore of paramount significance. SEU mitigation techniques such as Triple Modular Redundancy (TMR) and configuration scrubbing are well known. However, these techniques either provide high resource overheads or utilize resources that are themselves susceptible to SEUs. In this work, we propose a Lightweight and Fully Testable SEU Mitigation system–LFTSM that combines high-speed Xilinx FPGA internal configuration repair mechanism with a robust external scrubber in processor cores, targeting SEUs in SRAM-based Xilinx SoC FPGAs (Zynq). The internal repair mechanism corrects single-bit upsets and notifies external scrubber when multi-bit upsets are detected. Multi-bit upsets are classified and repaired by the external scrubber. Our proposed LFTSM system aims to achieve reliability in resource-intensive FPGA application systems providing minimal resource utilization with less than 1% resource overheads (on XC7Z020 FPGA) and the widest fault coverage. Our system provides the smallest resource utilization in comparison to other solutions in the literature and offers full testing control in compliance with Automotive Safety Integrity Level (ASIL); a risk classification standard defined by the ISO 26262. Our solution neither requires the usage of external memories nor third-party tools. We implemented the LFTSM system on Xilinx Zynq SoC (with XC7Z020 FPGA). We validated the fault detection efficiencies of our design using fault injection testing with complete control over the number and locations of error injections in the configuration memory. For the XC7Z020 device, LFTSM scans all configuration bits in multiple microseconds, detects upsets within 8ms and then corrects single-bit and multi-bit upsets in further few milliseconds. We successfully integrated and tested the proposed LFTSM system with the industrial resource-hungry application systems for automotive.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    1
    Citations
    NaN
    KQI
    []