Race Condition and Deadlock Detection for Large-Scale Applications

2016 
Debugging large-scale parallel applications is a problematic issue. Characteristics of scalability bring about an exponential increase in errors and many impacts on performance. With suffering unacceptable overhead and debugging time, traditional techniques, such as checkpointing or record and replay, have become obsolete when applying to largescale parallel applications. The ex-scale trend is coming, which demands cutting-edge large-scale parallel application debugging techniques. Instead of prior works based on locating exact errors, we proposed an on-the-fly approach by detecting abnormal behaviors arising frequently in complicated message passing channels. In this paper, anomalies are race conditions causing concealing deadlocks which probably result in hangs and make programmers unable to inspect manually errors. The technique utilizes one state-of-the-art detection algorithm which is related to allocation and management tactics. The proposed algorithm is proved the precision and effectiveness by theoretical proofs and experimental results. With acceptable overhead, this technique shows the potential for applying to large-scale parallel applications, specially ones running as master/slave model.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    2
    Citations
    NaN
    KQI
    []