Enhancing application performance using heterogeneous memory architectures on a many-core platform

2016 
The 2 nd generation Intel® Xeon Phi processor (codenamed Knights Landing) is Intel's first self-booting Xeon Phi processor that is aimed at the HPC market. Like its predecessor, KNL is a many-core, highly threaded processor featuring an innovative on-die mesh interconnect and an on-package high bandwidth memory MCDRAM in addition to DRAM DDR-2400, which makes it possible for many HPC applications to achieve much higher performance by leveraging heterogeneous memory configuration. In this paper, we look at the programming challenges for software developers to create and manipulate data using different memory modes and a heap management API to satisfy the ever-increasing demand for high bandwidth and low latency. We start with a functional KNL architecture introduction with an emphasis on the memory subsystem and memory usage model, followed by the utility tools required to run the applications under various scenarios. We then present a profiler-based heterogeneous memory optimization framework for all memory-bandwidth-intensive applications. The new memory object features in Intel® VTune™ Amplifier will be introduced and discussed. Finally, we show how to leverage different kinds of memory by using a user extensible memory heap management API also known as memkind API. Throughout our discussions, we will use a classic streaming application in quantitative finance, the Black-Scholes benchmark. We show how to highlight the memory bottleneck using the new memory profiling features in the Intel VTune Amplifier and how to achieve high bandwidth by removing the bottleneck and by allocating memory threads between different types of memory. In the end, we show the peak performance we can achieve on KNL by using a combination of MCDRAM and DDR.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    1
    References
    3
    Citations
    NaN
    KQI
    []