Distributed job scheduling in MetaCentrum

2015 
MetaCentrum - The Czech National Grid provides access to various resources across the Czech Republic. The utilized resource management and scheduling system is based on a heavily modified version of the Torque Batch System. This open source resource manager is maintained in a local fork and was extended to facilitate the requirements of such a large installation. This paper provides an overview of unique features deployed in MetaCentrum. Notably, we describe our distributed setup that encompasses several standalone independent servers while still maintaining full cooperative scheduling across the grid. We also present the benefits of our virtualized infrastructure that enables our schedulers to dynamically request ondemand virtual machines, that are then used to facilitate the varied requirements of users in our system, as well as enabling support for user requested virtual clusters that can be further interconnected using a private VLAN.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    0
    Citations
    NaN
    KQI
    []