The Multicomputer Toolbox includes sparse, dense, and iterative scalable linear algebra libraries. Dense direct, and iterative linear algebra libraries are covered in this paper, as well as the distributed data structures used to implement these algorithms; concurrent BLAS are covered elsewhere. We discuss uniform calling interfaces and functionality for linear algebra libraries. We include a detailed explanation of how the level-3 dense LU factorization works, including features that support data distribution independence with a blocked algorithm. We illustrate the data motion for this algorithm, and for a representative iterative algorithm, PCGS. We conclude that data distribution independent libraries are feasible and highly desirable. Much work remains to be done in performance tuning of these algorithms, though good portability and application-relevance have already been achieved.< >
MPI is the de facto message passing standard for multicomputers and networks of workstations, established by the MPI Forum, a group of universities, research centers, and national laboratories (from both the United States and Europe), as well as multi-national vendors in the area of high performance computing. MPI has been implemented already by several groups. Worldwide acceptance of MPI has been quite rapid. This paper overviews several areas in which MPI can be extended, discusses the merits of making such extensions, and begins to demonstrate how some of these extensions can be made. In some areas, such as intercommunicator extensions, significant progress has been made by us already. In other areas (such as remote memory access), we are merely proposing extensions to MPI that we have not yet reduced to practice. Furthermore, we point out that other researchers are evidently working in parallel with us on their own extension concepts for MPI.< >
We consider systematic parallel solution of ordinary differential-algebraic equations (DAE's) of low index (including stiff ODE's). We target multicomputers, message-passing concurrent computers, such as Intel's iPSC/2 hypercube and the Symult s2010 2D mesh. The programming model is reactive and/or loosely synchronized communicating sequential processes.
We present new approaches to efficient application-level message passing through the Zipcode communication layer (built upon the Caltech Reactive Kernel), which is shown to be both portable and effective for complex multicomputer codes. Zipcode promotes the elegant expression of message passing in large applications, an important sub-goal.
We present closed-form O(1)-memory, O(1)-time data distributions providing parametric control over degree of coefficient blocking and scattering. These new distributions permit effective formulations of the DAE's and higher sparse linear algebra performance.
We present results for concurrent sparse, unsymmetric linear algebra. A two-phase approach is used, like Harwell's MA28. New results include: reduced communication pivoting and improvement of triangular-solve performance via the parametric distributions: LU factorization load balance is traded against solve performance. Overall performance is thereby increased. Good factorization speedups are attained for examples, but exploitation of multiple concurrent pivots remains a needed extension. Triangular solves prove disappointing on an absolute scale, despite significant effort.
Two approaches to concurrent simulation are developed: the Waveform Relaxation (Picard-Lindelof) methodology extends to binary distillation simulation and further; it is inherently very concurrent. We address the achievable concurrent performance of sequential approaches via Concurrent DASSL, which extends Petzold's DASSL algorithm to multicomputers. A simulation driver for arbitrary networks of distillation columns is described. For a 9009-integration-state system with seven distillation columns, we demonstrate a speedup of approximately five. The low speedup is attributable to the simplicity of the thermodynamic model used, and the nearly narrow-banded Jacobian structure. Other chemical-engineering systems could perform substantially better.
We suggest Waveform Relaxation as the key focus of future research for the particular distillation problem class cited. We indicate future areas for application of Concurrent DASSL, and suggest ways to improve its concurrent performance, coupled with improvements in sparse linear algebra.
This paper discusses the proposed National High Performance Distributed Computing Consortium (NHPDCC), a second-generation high performance computing consortium. NHPDCC will build high performance computing solutions via distributed, heterogeneous machines with diverse, complementary capabilities. Unlike single-machine consortia, NHPDCC will strive to demonstrate practical solutions to current and future "challenge" and "grand-challenge" applications, by utilizing multi-vendor, commercial products, rather than prototypes. Specific software projects to be implemented by the consortium personnel include MPI message-passing standard and related tools, scalable parallel libraries, benchmarks, as well as systems software. Consortium members will benefit through access to both the hardware and software of the consortium. In addition, the synergy generated by interactions among NHPDCC members will be a significant benefit to consortium members.< >
Advancement in communication technologies and the Internet of Things (IoT) is driving smart cities adoption that aims to increase operational efficiency and improve the quality of services and citizen welfare. It is estimated that by 2020, 75% of cars shipped globally will be equipped with hardware to facilitate vehicle connectivity. The privacy, reliability and integrity of communication must be ensured so that actions can be accurate and implemented promptly after receiving actionable information. Because vehicles are equipped with the ability to compute, communicate, and sense their environment, there is a concomitant critical need to create and maintain trust among network entities in the context of the network's dynamism, an issue that requires building and validating the trust between entities in a small amount of time before entities leave each other's range. In this work, we present a multi-tier scheme consisting of an authentication and trust building/distribution framework designed to ensure the safety and validity of the information exchanged in the system.