A massively-parallel Navier-Stokes implementation

D A Calahan

A massively-parallel Navier-Stokes implementation

1989

D A Calahan

The results of implementing a 3-D production MacCormack explicit NavierStokes code with damping on a 1024-node NCUBE hypercube is presented. Among the issues and results presented are (1) grid partitioning, (2) performance analysis and comparison with CRAY vector processors, and (3) detailed timing model. The regularity of the uniform grids traditionally used in flow modeling has been responsible for the success of vector multiprocessors such as CRAYclass machines in solving fluid flow problems. Parallelization of serial codes on such shared-memory machines by multitasking is usually straightforward and achieves a high processor utilization for the common 4 to 8 processor configuration. Indeed, success on such architectures has inspired other investigators to study the distribution of data as well as computation on so-called massivelyparallel machines such as the hypercube. r . Although such implementations are typically far more difficult to code since no common memory exists the payoff for systematically organizing inter* Student, Aerospace Engineering * * Student, Elec. Engring. & Comp. Science + Professor, Elec. Engring. & Comp. Science Copyright O American Institute of Aeronautics and Astronautics, Inc., 1989. All rights reserved. 125 processor data flow is the reduction of total data flow and the consequent conflicts over data paths. In contrast, access conflicts in sharedmemory multiprocessors are assumed to occur randomly and presently limit the number of processors to eight. Fortunately, the same distributed CFD algorithms can be the basis of many production codes and are applicable to several current MIMD commercial architectures, so that significant programming effort can be justified. This paper extends the distributed CFD results of Catherasoo and others to include a full Navier-Stokes (N-S) code with damping and, more importantly, shows the efficiency of distributed N-S algorithms up to 1024 nodes (processors) . 3 , Detailed timing analysis identifies sources of the modest parallelization overhead. 11. !i this was transformed to a rectangular uniform coordinate system. A simple grid generator was written which would develop grid x, y,z coordinates for a grid of specified size. The common characteristics among all grids used in timing cases was that grid points were evenly spaced at increments of .2 radially (outward from the cylinders surface) and .2 axially (along the length of the cylinder). See Figure 1 for two views of a typical grid. The same data set was used for all timing cases. The data set had the following characteristics: 1) Freestream Mach Number = 1.1 2) Freestream Reynolds Number = 10,000 3) Freestream Temperature = 461.7 R 4) Wall Temperature = 550.25 R 5) Characteristic Length = 1.0 (cylinder of radius 1.0) 6) Damping coefficients =3

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations