Job Classification Through Long-Term Log Analysis Towards Power-Aware HPC System Operation
2021
High utilization of HPC system resources under constraints in electric power consumption or I/O workload is one of the primary goals to deal with high demand from application users. Utilization of CPU and memory, which is tightly related to electric power consumption, is counterpart metric of I/O activities in most HPC jobs. Towards higher utilization of HPC systems under restriction in management for electric power consumption and I/O activities, we need to care not to have hot-spots in power consumption or I/O operations because such situation leads to unstable system operation by exceeding capability of electric power supply or the I/O subsystem in such hot-spots. Analysis of a huge scale of log data collected from the K computer has revealed high correlation between I/O activities and CPU and memory utilization in some specific compute node layouts, showing unique characteristics of HPC jobs such as computation intensive or I/O-intensive. It has turned out that classifying jobs in terms of required electric power can divide into two groups, jobs consuming high electric power and I/O-intensive jobs. We have succeeded in job classification by achieving high correctness using machine learning approach, and we have confirmed effectiveness of the classification towards power-aware system operation in our next HPC system, the supercomputer Fugaku.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
13
References
0
Citations
NaN
KQI