HPCG: Preliminary Evaluation and Optimization on Tianhe-2 CPU-only Nodes

2014 
HPCG has become a new metric for the design and ranking of HPC. By incorporating a local symmetric Gauss-Seidel preconditioned, HPCG implements the Conjugate Gradient method to solve a sparse linear system. HPCG performs poorly with irregular memory access and may consume a great deal of MPI resources when it is executed on supercomputers. This paper focuses on optimizing SpMV and the Gauss-Seidel preconditioned, the two most important kernels in HPCG. By evaluating the performance impacts of several representative sparse matrix formats, ELLPACK is selected due to its suitability for SIMD, resulting in a speedup of 2.3x for the SpMV kernel. Multi-coloring is performed for Gauss-Seidel, resulting in a speedup of 7.3x over the reference implementation. The CG convergence rate may also be improved after multi-coloring. Our experimental results show that our optimization process works well on supercomputers, achieving 6.5 Gflops on a CPU-only node. This has boosted the total HPCG Gflops by about 7x, giving rise to 80,151 Gflops on 8192 CPU-only Tianhe-2 nodes.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    32
    References
    4
    Citations
    NaN
    KQI
    []