SDLB-Scheduler with Dynamic Load Balancing for Heterogeneous Computers

2006 
Publisher Summary The chapter discusses the scheduler with dynamic load balancing for heterogeneous computers (SDLB). SDLB is a distributed job scheduler that can schedule jobs on multiple clusters of computers with dynamic load balancing ability. The chapter describes newly added features of SDLB. SDLB supports computers running in the dedicated mode, the batched mode, and the multiuser interactive mode. A user can submit single jobs or parallel jobs to the specified machines or let the scheduler find the best suitable machines for the jobs. Installed in each cluster, an SDLB can negotiate with other SDLBs for acquiring computation resources. The scheduler also has the fault-tolerance capability. When there is a failure in part of the computers or computer software that support a particular application job, the scheduler can detect the hanging application job and automatically restart it on other working computers. If the application job can periodically store the checkpoint data, the job will be restarted from the latest checkpoint, otherwise the job will be restarted from the beginning.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    2
    References
    0
    Citations
    NaN
    KQI
    []