Scaling up Hartree-Fock calculations on Tianhe-2

2016 
This paper presents a new optimized and scalable code for Hartree-Fock self-consistent field iterations. Goals of the code design include scalability to large numbers of nodes, and the capability to simultaneously use CPUs and Intel Xeon Phi coprocessors. Issues we encountered as we optimized and scaled up the code on Tianhe-2 are described and addressed. A major issue is load balance, which is made challenging due to integral screening. We describe a general framework for finding a well-balanced static partitioning of the load in the presence of screening. Work stealing is used to polish the load balance. Performance results are shown on Stampede and Tianhe-2 supercomputers. Scalability is demonstrated on large simulations involving 2938 atoms and 27,394 basis functions, utilizing 8100 nodes of Tianhe-2.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    35
    References
    10
    Citations
    NaN
    KQI
    []