High performance fault tolerant computer and its fault recovery

1997 
The authors proposed a new architecture for an FTC called QPR (Quad Processor Redundancy) in which duplicated CPUs operate under a hardware lock step, and duplicated I/Os are managed by software. A dual system bus combines two duplicated areas. After recovery from a fault, it is necessary to resynchronize the system, so the contents of the main memory must be copied from the normal CPU to the other CPU. The overhead for copying must be small, so that the normal CPU can still continue the application. They describe a fault recovery method especially for a memory copying method. When a memory access has occurred, the memory interface unit snoops the data and sends them to another CPU using the dual system bus. They measured copy time using the real machine and simulated the copy overhead under a heavy DMA load. They obtained a small overhead and small load dependency.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    6
    References
    4
    Citations
    NaN
    KQI
    []