PIKA: Center-Wide and Job-Aware Cluster Monitoring

Robert Dietrich,Frank Winkler,Andreas Knüpfer,Wolfgang E. Nagel

PIKA: Center-Wide and Job-Aware Cluster Monitoring

2020

Robert Dietrich
Frank Winkler
Andreas Knüpfer
Wolfgang E. Nagel

Nowadays, performance optimization is more or less an established procedure in high-performance computing (HPC) centers. To sustainably increase compute efficiency of such systems, we need to increase the awareness of efficiency on both the operator's and the users' side. Therefore, we propose an infrastructure for continuous monitoring and analysis, which automatically characterizes HPC jobs and provides a systematic approach to identify underperforming compute jobs with optimization potential. The recorded metadata and time-series data can be visualized live at runtime or post-mortem and are eventually stored for long-term analysis. The monitoring has a negligible overhead on the compute nodes and neither influences nor limits the user applications.

Keywords:

Continuous monitoring
Metadata
Data visualization
Operator (computer programming)
Data collection
Pika
Computer science
Database

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations