Electromagnetics Computations Using the MPI Parallel Implementation of the Steepest Descent Fast Multipole Method (SDFMM).

Magda El-Shenawee,Carey M. Rappaport,Desheng Jiang,Waleed Melsei,Alexander B. Yakovlev,Sean Ortiz,Mete Ozkar,Amir Mortazawi,Michael B. Steer,C. W. Trueman,Aric Choiniere,Jean-Jacques Laurin,Abdelnasser A. Eldek,Atef Z. Elsherbeni,Charles E. Smith,Juergen Von Hagen,Werner Wiesbeck,Osama M. Abo-Seida

Electromagnetics Computations Using the MPI Parallel Implementation of the Steepest Descent Fast Multipole Method (SDFMM).

2002

The computational solution of large-scale linear systems of equations necessitates the use of fast algorithms but is also greatly enhanced by employing parallelization techniques. The objective of this work is to demonstrate the speedup achieved by the MPI (Message Passing Interface) parallel implementation of the Steepest Descent Fast Multipole Method (SDFMM). Although this algorithm has already been optimized to take advantage of the structure of the physics of scattering problems, there is still the opportunity to speed up the calculation by dividing tasks into components using multiple processors and solve them in parallel. The SDFMM has three bottlenecks ordered as (1) filling the sparse impedance matrix associated with the near-field Method of Moments interactions (MoM), (2) the matrix vector multiplications associated with this sparse matrix (3) the far field interactions associated with the fast multipole method. The parallel implementation task is accomplished using a thirty-one node Intel Pentium Beowulf cluster and is also validated on a 4-processor Alpha workstation. The Beowulf cluster consists of thirty-one nodes of 350MHz Intel Pentium IIs with 256 MB of RAM and one node of a 4x450MHz Intel Pentium II Xeon shared memory processor with 2GB of RAM with all nodes connected to a 100 BaseTX Ethernet network. The Alpha workstation has a maximum of four 667MHz processors. Our numerical results show significant linear speedup in filling the sparse impedance matrix. Using the 32-processors on the Beowulf cluster lead to achieve a 7.2 overall speedup while a 2.5 overall speedup is gained using the 4-processors on the Alpha workstation.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations