MC-CChecker: A Clock-Based Approach to Detect Memory Consistency Errors in MPI One-Sided Applications

2018 
MPI one-sided communication decouples data movement from synchronization, which eliminates overhead from unneeded synchronization and allows for greater concurrency. On the one hand this fact is the great advantage of MPI one-sided communication, but on the other, it poses enormous challenges for programmers in preserving the reliability of programs. Memory consistency errors are notorious for degrading reliability as well as performance of MPI one-sided applications. Even an MPI expert can easily make these mistakes. The lockopts bug occurred in an RMA test case that is part of MPICH MPI implementation is an example for this situation. Hence, detecting memory consistency errors is extremely challenging. MC-Checker is the most cutting-edge debugger to address these errors effectively. MC-Checker tackles the memory consistency errors based on the happened-before relation. Taking full advantage of the relation makes DN-Analyzer of MC-Checker difficult to scale well. For that reason, MC-Checker does ignore the transitive ordering of the happened-before relation to retain scalability of DN-Analyzer. Consequently, MC-Checker is highly able to impose a potential source of false positives. In order to overcome this issue, we present a novel clock-based approach called MC-CChecker with the aim of fully preserving the happened-before relation by making use of an encoded vector clock. MC-CChecker inherits distinguishing features from MC-Checker by reusing ST-Analyzer and Profiler while focusing mainly on the optimization of DN-Analyzer. The experimental findings prove that MC-CChecker not only effectively detects memory consistency errors as MC-Checker did, but also completely eliminates the potential source of false positives which is a major limitation of MC-Checker while still retaining acceptable overheads of execution time and memory usage for DN-Analyzer. Especially, DN-Analyzer of MC-CChecker is fairly scalable when processing a large amount of trace files generated from running the lockopts up to 8192 processes.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    3
    Citations
    NaN
    KQI
    []