PaScaL_TDMA: A library of parallel and scalable solvers for massive tridiagonal systems

2021 
Abstract The aim of this study is to devise an efficient and scalable computational procedure to solve the many tridiagonal systems in multi-dimensional partial differential equations. The modified Thomas algorithm and a newly designed communication scheme were used to reduce the communication overhead encountered while solving the many tridiagonal systems. Benchmark test results reveal an advantage of the proposed procedures compared to global all-to-all communication methods — a significantly reduced communication time that becomes more prominent for larger problem sizes and greater number of cores. The proposed computational procedures are fully implemented in an open-source library called Parallel and Scalable Library for TDMA (PaScaL_TDMA). Considering a three-dimensional heat conduction problem as a practical example, we obtain good strong and weak scalability results up to 262,144 computing cores on the KISTI Nurion cluster system, which, to the best of our knowledge, is the largest parallel simulation for solving tridiagonal systems. The potential of this library for large-scale substantive problems in physics is also demonstrated through direct numerical simulations of the Rayleigh–Benard convection problem, which yielded excellent scalability and accurate results. Program summary Program Title: PaScaL_TDMA CPC Library link to program files: http://dx.doi.org/10.17632/49z6fh94z3.1 Developer’s repository link: https://github.com/MPMC-Lab Licensing provisions: MIT Programming language: Fortran90 Nature of problem: Tridiagonal systems for solving multi-dimensional partial differential equations. Solution method: The divide-and-conquer method is used to solve partitioned tridiagonal systems of equations in the distributed memory system. The partitioned tridiagonal systems of equations are transformed into modified sub-matrices using the modified Thomas algorithm [1]. Reduced tridiagonal systems are constructed by collecting the first and last rows of the modified submatrices from each computing core. The newly designed communication scheme based on MPI_Alltoallw accelerates to collect rows and construct the reduced tridiagonal systems. The solutions of the reduced tridiagonal systems are obtained via the sequential Thomas algorithm. Thereafter, the remaining unknowns in the modified sub-matrices are solved using solutions of the reduced tridiagonal systems.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    29
    References
    5
    Citations
    NaN
    KQI
    []