Understanding I/O Bottlenecks and Tuning for High Performance I/O on Large HPC Systems: A Case Study

2018 
As we move towards peta-to-exascale machines, large-scale physics based simulations are expected to generate large amount of I/O traffic based on unprecedented growth in the volume and types of data. It is imperative to understand and characterize the I/O behavior of scientific applications, including complex checkpoint/restart options, on different hardware-software configurations including large shared parallel file systems, node local flash, and burst buffer technologies, to tune and improve the overall application performance. In this work, we study the I/O behavior of WRF, a widely used scientific application for atmospheric research and operational weather forecasting, on high performance computing systems. WRF provides a rich collection of I/O strategies such as using different parallel I/O libraries (PnetCDF, NetCDF) and I/O quilting options with these libraries, as well as configurable I/O "knobs" that can be used to modify the I/O frequency. We evaluate the effectiveness of using various I/O strategies within WRF in conjunction with parallel file system parameter tuning on Comet and Stampede2 HPC systems. We discuss the impact of using various parallel I/O strategies and further show the use of an I/O profiling tool to analyze an anomalous parallel I/O behavior. Overall, we provide a discussion on tuning and performance insights gained from our evaluations.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    5
    References
    2
    Citations
    NaN
    KQI
    []