Exposing data locality in HPC-based systems by using the HDFS backend
2020
Nowadays, there are two main approaches for dealing with data-intensive applications: parallel file systems in classical High-Performance Computing (HPC) centers and Big Data like parallel file system for ensuring the data centric vision. Furthermore, there is a growing overlap between HPC and Big Data applications, given that Big Data paradigm is a growing consumer of HPC resources. HDFS is one of the most important file systems for data intensive applications while, from the parallel file systems point of view, MPI-IO is the most used interface for parallel I/O. In this paper, we propose a novel solution for taking advantage of HDFS through MPI-based parallel applications. To demonstrate its feasibility, we have included our approach in MIMIR, a MapReduce framework for MPI-based applications. We have optimized MIMIR framework by providing data locality features provided by our approach. The experimental evaluation demonstrates that our solution offers around 25% performance for map phase compared with the MIMIR baseline solution.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
24
References
0
Citations
NaN
KQI