Modeling soft errors for data caches and alleviating their effects on data reliability

2010 
Soft errors caused by strikes arising from energetic particles pose a significant reliability concern for computing systems. In this study, we first introduce a model for soft error occurrence and propagation in cache memories. Based on this model, we define a metric called Architectural Vulnerability Factor for Caches (AVFC), which represents the probability with which a fault in the cache can be visible in the final output of the program. We then propose three architectural schemes for improving reliability. Our first scheme prevents an error from propagating to the lower levels in the memory hierarchy by not forwarding the unmodified data words of dirty cache blocks to the L2 cache at write-backs. The second scheme selectively invalidates cache blocks to reduce their vulnerable periods. To reduce the performance overhead caused by block invalidation, our third scheme tries to bring a fresh copy of the invalidated block into the cache via prefetching. The experimental results for the SPEC2000 suite show that, based on the proposed model, our first and third schemes together can improve the data reliability roughly 96% at the cost of less than 1% overhead in execution time, quite more than data improvements achieved by either two well-known techniques, namely write-through and early write-back cache mechanisms.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []