Analysis of NUMA effects in modern multicore systems for the design of high-performance data transfer applications

2017 
To capitalize on multicore power, modern high-speed data transfer applications usually adopt multi-threaded design and aggregate multiple network interfaces. However, NUMA introduces another dimension of complexity to these applications. In this paper, we undertook comprehensive experiment on real systems to illustrate the importance of NUMA-awareness to applications with intensive memory accesses and network I/Os. Instead of simply attributing the NUMA effect to the physical layout, we provide an in-depth analysis of underlying interactions inside hardware devices. We profile the system performance by monitoring relevant hardware counters, and reveal how the NUMA penalty occurs during prefetch and cache synchronization processes. Consequently, we implement a thread mapping module in a bulk data transfer software, BBCP, as a practical example of enabling NUMA-awareness. The enhanced application is then evaluated on our high-performance testbed with storage area networks(SAN). Our experimental results show that the proposed NUMA optimizations can significantly improve BBCPs performance in memory-based tests with various contention levels and realistic data transfers involving SAN-based storage. Comprehensive tests over modern hosts to show the importance of NUMA-awareness.A first-order analysis on the NUMA effect of the-state-of-the-art high-end systems.Detailed study shows how NUMA effects get amplified in multicore hardware.Exemplify the specific benefits of NUMA-awareness with modern real-world applications.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    32
    References
    9
    Citations
    NaN
    KQI
    []