logo
    A parallel row projection solver for large sparse linear systems
    2
    Citation
    9
    Reference
    10
    Related Paper
    Citation Trend
    Abstract:
    In this paper we present a parallel iterative solver for large and sparse nonsymmetric linear systems. The solver is based on a row-projection algorithm, derived from the symmetrized block version of the Kaczmarz method with Conjugate Gradient acceleration. A comparison with some Krylov subspace methods shows the remarkable robustness of this algorithm when applied to systems with eigenvalues arbitrarily distributed in the complex plane. The parallel version of the algorithm was developed for MIMD distributed memory machines and it is based on a row partitioning approach which allows to compute each iteration as a simultaneous set of independent least squares problems. Moreover, we propose a data distribution strategy leading to a scalable communication scheme. The algorithm has been tested both on a system Intel iPSC/860 and on the Intel Touchstone DELTA System, running the Intel NX message passing environment.< >
    Keywords:
    Solver
    MIMD
    Distributed memory
    Krylov subspace
    Robustness
    This paper is concerned with the choice of the appropriate programming model for clusters of symmetric multiprocessor (SMP) machines. Software vendors who wish to take advantage of cluster-based multiprocessing can either support message passing, hardware shared memory, or shared memory on top of a DSM runtime system. Software vendors entering the marketplace are similarly faced with the difficult choice between a well-supported message passing standard (MPI) and an emerging shared memory standard (OpenMP) that offers convenience and the promise of performance.
    Distributed shared memory
    Distributed memory
    Data diffusion machine
    Message Passing Interface
    Citations (8)
    Although message-passing is a versatile communication paradigm in the multiprocessing arena [AthSe88], a pure message-passing mechanism via special communication channels is inefficient. Shared-memory multiprocessor systems are usually much cheaper and efficient. Conventional approaches tend to implement message-passing on top of shared-memory architectures using a pure software approach. Even with special techniques [FinHe88], the performance of such systems is still worse than simple shared-memory communication systems. In this paper, we shall present a hardware approach and new message-passing mechanisms that the pure software approach does not support. Such an approach is quite cost effective. The resulting system is suitable for studying concurrent software, scientific computations and distributed problem solving.
    Distributed memory
    Distributed shared memory
    Citations (0)
    The availability of multiprocessing workstations offers the opportunity to simultaneously apply both shared and distributed memory parallel processing techniques to a single application. It has been assumed that using the data sharing formalisms native to the specific machine would yield performance superior to those obtained with a more general message passing system. We compared the performance of pure message passing methods with a combination of message passing and shared memory data sharing techniques and found that the benefit of the added complexity of dual level programming is negligible or non-existent.
    Distributed memory
    Distributed shared memory
    Data diffusion machine
    Workstation
    Citations (0)
    Software Distributed Shared Memory (SDSM) systems provide programmers with a shared memory programming environment across distributed memory architectures. In contrast to the message passing programming environment, the SDSM can resolve data dependencies within the application without the programmer having to explicitly specify communication. However, this service is provided at a cost to performance. Thus it makes sense to use message passing directly when data dependencies are easy to solve using message passing. For example, it is not complicated to specify data transfer for large contiguous regions of memory. This paper outlines how the Danui SDSM library has been extended to include support for message passing. Four different message passing transfers are identified depending on whether the data being sent/received resides in private or globally shared buffers. Transfers between globally shared buffers are further categorized as symmetrical or asymmetrical depending on whether they correspond to the same region of shared memory. The implication of each transfer type on the memory consistency of the global address space is discussed. Central to the Danui SDSM extension is the use of information provided and implied by message passing operations. The overhead of the implementation is analyzed.
    Distributed shared memory
    Programmer
    Distributed memory
    Data diffusion machine
    Remote procedure call
    The porting of codes using the Message Passing Interface (MPI) between distributed memory platforms becomes as simple as moving the program to the target machine and recompiling. For codes written under the shared memory paradigm to be able to take advantage of this easy porting, they must first be translated to execute under a distributed memory environment. The author focuses on the translation of non-numeric parallel algorithms from shared memory to distributed memory machines. Specifically, he presents techniques to determine where calls to MPI message passing routines must be inserted to preserve data access patterns inherent in the original shared memory code.
    Porting
    Message Passing Interface
    Distributed memory
    Distributed shared memory
    Interface (matter)
    Data diffusion machine
    Citations (1)
    This paper describes the methods used and experiences made with implementing a finite element application on three different parallel computers with either message passing or shared virtual memory as the programming model. Designing a parallel finite element application using message-passing requires to find a data domain decomposition to map data into the local memory of the processors. Since data accesses may be very irregular, communication patterns are unknown prior to the parallel execution and thus makes the parallelization a difficult task. We argue that the use of a shared virtual memory greatly simplifies the parallelization step. It is shown experimentally on an hypercube iPSC/2 that the use of the KOAN/Fortran-S programming environment based on a shared virtual memory allows to port quickly and easily a sequential application without a significant degradation in performance compared to the message passing version. Results for recent parallel architectures such as the Paragon XP/S for message-passing and the KSR1 for shared virtual memory are presented, too.
    Distributed memory
    Distributed shared memory
    Data diffusion machine
    Message Passing Interface
    Citations (0)