Dynamic Error Detection for Dependable Cache Coherency in Multicore Architectures

2008 
In chip multiprocessor (CMP) systems the various effects of technology scaling make the on chip components more susceptible to faults. Most of the earlier schemes that address fault tolerance issues in CMPs adopt redundant-thread techniques. These techniques are mostly effective, except that they fail to detect errors resulting from faults in hardware components on chip that commonly serve multiple cores. The cache coherence controller (CC) logic, which ensures consistency of data shared among multiple threads, is a vital common component in CMPs. A fault in CC logic of any of the processors may lead to errors in the data states in the entire CMP system. It is observed that up to 59.6% of the memory references cause a change in cache state for SPLASH-2 applications. We propose a novel scheme with a verification logic that can dynamically detect errors in the CC logic of multiple cores in a CMP system. The entire verification logic is designed with a negligible area of 0.1372 sq.mm using a TSMC 0.18 mu4-metal layer process technology. Even at highly aggressive fault injection rates, the logic achieves an average error coverage of more than 95% (and almost 100% for some applications)
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    19
    Citations
    NaN
    KQI
    []