Method for detecting errors generated during parallel running

2015 
The invention provides a method for detecting errors generated during parallel running. The method includes the steps that a first counter with an original value 0 and a second counter with an original value 0 are set; when a process gets into MPI blocking operation, one is added to the first counter, and a timer is started; when the process is returned from the blocking operation, the value of the first counter is assigned to the second counter, and the timer is removed; in addition, if the MPI blocking exists in MPI calling, a software interrupt signal is triggered when the timer is full, the process gets into an interrupt processing function accordingly, and the current value of the first counter and the current value of the second counter are compared in the interrupt processing function; if the current value of the first counter and the current value of the second counter are not equal, the state is dumped, and then deadlock detection is carried out; if the current value of the first counter and the current value of the second counter are equal, the process is returned from the interrupt processing function, and a parallel program continues to be executed.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []