Acceleration of Communication-Aware Task Mapping Techniques through GPU Computing

J. Reyes,J.M. Ordua,G. Vigueras,R. Tornero

Acceleration of Communication-Aware Task Mapping Techniques through GPU Computing

2013

The era of distributed computing, where applications are executed on platforms like clusters, grids and/or clouds of computers, have shown the need for taking into account the communications that take place on distributed computer architectures when executing applications. In that environment, different communication-aware mapping techniques were proposed for improving the system performance, both for off-chip and for on-chip networks. Some of these proposals are based on heuristic search for finding pseudo-optimal assignments of a given population of tasks and processing elements. The technology improvement has allowed a significant increase in the problem size, multiplying the number of processor cores in each chip. Therefore, the proposals based on heuristic search must be accelerated in order to search in larger exploration domains within the same execution times. In this paper, we propose a comparative study of the parallel version of the local search method for communication-aware task mapping techniques. Unlike other comparative studies of heuristic methods implemented on GPUs, we compare the performance provided by the parallel version for GPUs with the performance provided by a MPI parallel version in terms of execution times and fitness values provided. The MPI version was executed on a cluster optimized for MPI applications. Also, we have considered a GPU with Fermi architecture and we have mapped the local search algorithm onto the GPU in order to improve the performance. The results show that the parallel implementation on a single GPU provides similar fitness function values than the MPI implementation on the cluster. However, the execution times required by the GPU implementation are significantly lower than the ones required by the MPI implementation, and these differences increase as so does size of the parallel system.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations