Maximizing Application Performance in a Multi-core, NUMA-Aware Compute Cluster by Multi-level Tuning

Gilad Shainer,Pak Lui,Martin Hilgeman,Jeffrey Layton,Cydney Stevens,Walker Stemple,Scot Schultz,Guy Ludden,Joshua Mora,Georg Kresse

Maximizing Application Performance in a Multi-core, NUMA-Aware Compute Cluster by Multi-level Tuning

2013

Achieving good application performance on a modern compute cluster of multi-core, multi-socket, NUMA-aware systems can be challenging. In this paper, we use VASP, a popular ab-initio quantum-mechanical MD simulation software, to investigate the various levels of the software, hardware, and network tuning that boosts performance on a Dell PowerEdge R815 HPC cluster with AMD “Interlagos” and “Abu-Dhabi” processors. We implement code changes with the free software stack that supports FMA and AVX CPU instructions on the Bulldozer/Piledriver architecture. We analyze the MPI communications by profiling, compare the scalability performance of different interconnects, and discuss various MPI tuning parameters show effects of the advanced features that are crucial to the scalability performance of InfiniBand, including MXM and SRQ, which optimize the network resources for MPI communications. We investigate the importance of the MPI process placement, and introduce a process allocation tool that facilitates the affinity grouping on a multicore architecture.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations