Comment on "Accurate and Scalable O(N) Algorithm for First-Principles Molecular-Dynamics Computations on Large Parallel Computers"

David R. Bowler,Tsuyoshi Miyazaki,Lionel A. Truflandier,M. J. Gillan

Comment on "Accurate and Scalable O(N) Algorithm for First-Principles Molecular-Dynamics Computations on Large Parallel Computers"

2014

While we acknowledge the progress made by Osei-Ku uor and Fattebert in developing their O(N)algorithm[1], a number of the claims and statementsmade in the paper appear to us to be questionable, inparticular that they have presented the rst truly scal-able O(N) molecular dynamics algorithm. The claimsthat we wish to discuss in detail are: controllable ac-curacy; non-global communications and scalability; theapproach to inversion of the overlap matrix; and theirresults.There are a number of O(N) codes already availablewhich o er controllable accuracy in the basis set. TheONETEP code[2] uses periodic sinc or psinc functions[3],while the Conquest code[4, 5] uses b-spline functions[6]and the FEMTECK code[7] uses nite elements[8]. Inall these codes, the accuracy is systematically controlledusing a grid spacing which is directly equivalent to aplane-wave cuto , and involves no approximation in thekinetic energy (whereas a nite di erence approach ap-proximates the kinetic energy[9]).The principle of removing global communicationsto achieve scalability is well established (and paperson sparse matrix multiplication[10, 11] and sparse,parallel matrix multiplication in the FreeON[12, 13],ONETEP[14], CP2K[15] and Conquest codes[16] showthat this is strongly developed in the O(N) community).We disagree with the authors’ assertion that "the par-allel implementation of [algorithms to invert the S ma-trix] generally require some global coupling". Thereare a number of existing approaches to inverting theS matrix, including the orbital minimisation method(OMM)[17][18] used by FEMTECK, the method used inOpenMX[19], and Hotelling’s or Schultz’s method (whichwill be scalable and O(N) with sparse matrix algebra),as well as the approximate inverse methods cited by theauthors. We note that the approach that the authorssuggest is essentially the same as used in 1994 by Stechelet al.[20].Moreover, it is important to recall that one of the ma-jor e orts in the O(N) community in recent years hasbeen to develop locally communicating, scalable codes.The CP2K code has recently demonstrated calculationson 1,000,000 atoms with density functional tight bind-ing (DFTB) and 96,000 water molecules with DFT, scal-ing to 46656 cores[15]. The Conquest code has demon-strated scaling to over 2,000,000 atoms[21] on 4,096 pro-cessors, and recently scaled to 196,000 cores on the Kcomputer[22] as shown in Fig. 1. In the data in Fig. 2presented by Osei-Ku uor and Fattebert, there seems tobe a slow increase in wall clock time with system size onthe IBM BGQ which indicates some residual problemswith scalability in the implementation.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations