PADDLE: Performance Analysis Using a Data-Driven Learning Environment

2018 
The use of machine learning techniques to model execution time and power consumption, and, more generally, to characterize performance data is gaining traction in the HPC community. Although this signifies huge potential for automating complex inference tasks, a typical analytics pipeline requires selecting and extensively tuning multiple components ranging from feature learning to statistical inferencing to visualization. Further, the algorithmic solutions often do not generalize between problems, thereby making it cumbersome to design and validate machine learning techniques in practice. In order to address these challenges, we propose a unified machine learning framework, PADDLE, which is specifically designed for problems encountered during analysis of HPC data. The proposed framework uses an information-theoretic approach for hierarchical feature learning and can produce highly robust and interpretable models. We present user-centric workflows for using PADDLE and demonstrate its effectiveness in different scenarios: (a) identifying causes of network congestion; (b) determining the best performing linear solver for sparse matrices; and (c) comparing performance characteristics of parent and proxy application pairs.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    5
    Citations
    NaN
    KQI
    []