Explaining Cache SER Anomaly Using Relative DUE AVF Measurement

2010 
We have discovered that processors can experience a super-linear increase in detected unrecoverable errors (DUE) when the write-back L2 cache is doubled in size. This paper explains how an increase in the cache tag’s Architectural Vulnerability Factor or AVF caused such a super-linear increase in the DUE rate. AVF expresses the fraction of faults that become user-visible errors. Our hypothesis is that this increase in AVF is caused by a super-linear increase in “dirtydata residence times in the L2 cache. Using proton beam irradiation, we measured the DUE rates from the write-back cache tags and analyzed the data to show that our hypothesis holds. We utilized a combination of simulation and measurements to help develop and prove this hypothesis. Our investigation reveals two methods by which dirty line residency causes super-linear increases in the L2 cache tag’s AVF. One is a reduction in the miss rates as we move to the larger cache part, resulting in fewer evictions of data required for architecturally correct execution. The second is the occurrence of strided cache access patterns, which cause a significant increase in the “dirty” residency times of cache lines without increasing the cache miss rate.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    2
    Citations
    NaN
    KQI
    []