Victim cache

A victim cache is a small, usually fully associative cache placed in the refill path of CPU cache, and it stores all the blocks evicted from that level of cache. Miss caching places a fully-associative cache between cache and its re-fill path. Misses in the cache that hit in the miss cache have a one cycle penalty, as opposed to a many cycle miss penalty without the miss cache. Victim Caching is an improvement to miss caching that loads the small fully-associative cache with victim of a miss and not the requested cache line. A victim cache is a small, usually fully associative cache placed in the refill path of CPU cache, and it stores all the blocks evicted from that level of cache. Victim caching is a hardware technique to improve performance of caches proposed by Norman P. Jouppi. As mentioned in his paper: A victim cache is a hardware cache designed to decrease conflict misses and improve hit latency for direct-mapped caches. It is employed at the refill path of a Level 1 cache, such that any cache-line which gets evicted from the cache is cached in the victim cache. Thus, the victim cache gets populated only when data is thrown out of Level 1 cache. In case of a miss in Level 1, victim cache is looked up. If the resulting access is a hit, the contents of the Level 1 cache-line and the matching victim cache line are swapped. Though initially proposed by Jouppi to improve cache performance of a direct-mapped cache Level 1, modern day microprocessors with multi-level cache hierarchy employ Level 3/ Level 4 cache to act as victim cache for the cache lying above it in the memory hierarchy. Intel's Crystal Well of its Haswell processors introduced an on-package Level 4 cache which serves as a victim cache to processor's Level 3 cache. A 4–12 MB Level 3 cache is used as a victim cache in POWER5 (IBM) microprocessors. Processor performance and frequency has grown rapidly over the past three decades. Post mid-1980's it has grown by 52% annually largely driven by organisational and architectural ideas. 64-bit Intel Xeon processor (2004) can clock at 3.6 GHz thus having a cycle time of 0.27 ns. Memory cycle time however has not grown at this fast rate. This has resulted in processor cycle times being currently much faster than memory cycle times, and the trend has been for this gap to increase over time. The problem of increasing memory latency, relative to processor speed, has been dealt with by adding high speed cache memory. Direct-mapped caches have faster access time than set-associative caches. However, for a direct-mapped cache if multiple cache blocks in the memory map to same cache-line they end up evicting each other when anyone of them is accessed. This is known as the cache conflict problem. This problem is resolved by increasing the associativity of the cache. But there is a limit to which associativity can be increased owing to the complexity in its implementation. Thus, for solving the cache conflict problem for a cache with limited associativity victim cache is employed.

Parent Topic

Child Topic

No Parent Topic