Automated Analysis of Time Series Data to Understand Parallel Program Behaviors

2018 
Traditionally, performance analysis tools have focused on collecting measurements, attributing them to program source code, and presenting them; responsibility for analysis and interpretation of measurement data falls to application developers. While profiles of parallel programs can identify the presence of performance problems, often developers need to analyze execution behavior over time to understand how and why parallel inefficiencies arise. With the growing scale of supercomputers, such manual analysis is becoming increasingly difficult. In many cases, performance problems of interest only appear at larger scales. Manual analysis of time series data from executions on extreme-scale parallel systems is daunting as the volume of data across processors and time makes it difficult to assimilate. To address this problem, we have developed an automated analysis framework that generates compact summaries of time series data for parallel program executions. These summaries provide users with high-level insight into patterns in the performance data and can quickly direct a user's attention to potential performance bottlenecks. We demonstrate the effectiveness of our framework by applying it to time-series measurements of two scientific codes.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    2
    Citations
    NaN
    KQI
    []