The cache and memory subsystems of the IBM POWER8 processor

2015 
In this paper, we describe the IBM POWER8i cache, interconnect, memory, and input/output subsystems, collectively referred to as the Bnest.[ This paper focuses on the enhancements made to the nest to achieve balanced and scalable designs, ranging from small 12-core single-socket systems, up to large 16-processor-socket, 192-core enterprise rack servers. A key aspect of the design has been increasing the end-to-end data and coherence bandwidth of the system, now featuring more than twice the bandwidth of the POWER7A processor. The paper describes the new memory-buffer chip, called Centaur, providing up to 128 MB of eDRAM (embedded dynamic random-access memory) buffer cache per processor, along with an improved DRAM (dynamic random-access memory) scheduler with support for prefetch and write optimizations, providing industry-leading memory bandwidth combined with low memory latency. It also describes new coherence-transport enhancements and the transition to directly integrated PCIe A (PCI Express A ) support, as well as additions to the cache subsystem to support higher levels of virtualization and scalability including snoop filtering and cache sharing.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    43
    Citations
    NaN
    KQI
    []