This article explores the coupling of coarse and fine-grained parallelism for Finite Element simulations based on efficient parallel multigrid solvers.The focus lies on both system performance and a minimally invasive integration of hardware acceleration into an existing software package, requiring no changes to application code.Because of their excellent price performance ratio, we demonstrate the viability of our approach by using commodity graphics processors (GPUs) as efficient multigrid preconditioners.We address the issue of limited precision on GPUs by applying a mixed precision, iterative refinement technique.Other restrictions are also handled by a close interplay between the GPU and CPU.From a software perspective, we integrate the GPU solvers into the existing MPI-based Finite Element package by implementing the same interfaces as the CPU solvers, so that for the application programmer they are easily interchangeable.Our results show that we do not compromise any software functionality and gain speedups of two and more for large problems.Equipped with this additional option of hardware acceleration we compare different choices in increasing the performance of a conventional, commodity based cluster by increasing the number of nodes, replacement of nodes by a newer technology generation, and adding powerful graphics cards to the existing nodes.
We have previously presented an approach to include graphics processing units as co-processors in a parallel Finite Element multigrid solver called FEAST. In this paper we show that the acceleration transfers to real applications built on top of FEAST, without any modifications of the application code. The chosen solid mechanics code is well suited to assess the practicability of our approach due to higher accuracy requirements and a more diverse CPU/co-processor interaction. We demonstrate in detail that the single precision execution of the co-processor does not affect the final accuracy, and analyse how the local acceleration gains of factors 5.5-9.0 translate into 1.6- to 2.6-fold total speed-up.
Processor technology is still dramatically advancing and promises enormous improvements in processing data for the next decade. These improvements are driven by parallelisation and specialisation of resources, and ‘unconventional hardware’ like GPUs or the Cell processor can be seen as forerunners of this development. At the same time, much smaller advances are expected in moving data; this means that the efficiency of many simulation tools – particularly based on Finite Elements which often lead to huge, but very sparse linear systems – is restricted by the cost of memory access. We explain our approach to combine efficient data structures and multigrid solver concepts, and discuss the influence of processor technology on numerical and algorithmic developments. Concepts of ‘hardware-oriented numerics’ are described and their numerical and computational characteristics is examined based on implementations in Feast, a high performance solver toolbox for Finite Elements which is able to exploit unconventional hardware components as ‘FEM co-processors’, on sequential as well as on massively parallel computers. Finally, we demonstrate prototypically how these algorithmic and computational concepts can be applied to solid mechanics problems, and we present simulations on heterogeneous parallel computers with more than one billion unknowns.
In this paper multigrid smoothers of Vanka-type are studied in the con- text of Computational Solid Mechanics (CSM).These smoothers were originally de- veloped to solve saddle-point systems arising in the field of Computational Fluid Dynamics (CFD), particularly for incompressible flow problems. When treating (nearly) incompressible solids, similar equation systems arise so that it is reason- able to adopt the 'Vanka idea' for CSM. While there exist numerous studies about Vanka smoothers in the CFD literature, only few publications describe applications to solid mechanical problems. With this paper we want to contribute to close this gap. We depict and compare four different Vanka-like smoothers, two of them are oriented towards the stabilised equal-order Q1/Q1 finite element pair. By means of different test configurations we assess how far the smoothers are able to handle the numerical difficulties that arise for nearly incompressible material and anisotropic meshes. On the one hand, we show that the efficiency of all Vanka-smoothers heav- ily depends on the proper parameter choice. On the other hand, we demonstrate that only some of them are able to robustly deal with more critical situations. Fur- thermore, we illustrate how the enclosure of the multigrid scheme by an outer Krylov space method influences the overall solver performance, and we extend all our examinations to the nonlinear finite deformation case.
We have previously suggested a minimally invasive approach to include hardware accelerators into an existing large-scale parallel finite element PDE solver toolkit, and implemented it into our software FEAST. Our concept has the important advantage that applications built on top of FEAST benefit from the acceleration immediately, without changes to application code. In this paper we explore the limitations of our approach by accelerating a Navier-Stokes solver. This nonlinear saddle point problem is much more involved than our previous tests, and does not exhibit an equally favourable acceleration potential: Not all computational work is concentrated inside the linear solver. Nonetheless, we are able to achieve speedups of more than a factor of two on a small GPU-enhanced cluster. We conclude with a discussion how our concept can be altered to further improve acceleration.