Aggregated monitoring and automatic site exclusion of the ATLAS computing activities: the ATLAS Site Status Board

2011 
In the context of the Large Hadron Collider (LHC) at the European Organization for Nuclear Research (CERN), ATLAS (A Toroidal LHC Apparatus) is one of the six particle detectors constructed at the accelerator. The ATLAS experiment generates large amounts of raw data which are analysed by physicists and physics groups in tens of sites all around the world. There are various monitoring tools spread around the many sites to check the status of the different activities. The ATLAS Site Status Board (SSB) is a framework to monitor the overall status of the ATLAS distributed computing activities in the sites. From another hand, with this monitoring information we have created an infrastructure to automatically exclude and re-include sites in the different activities on the basis of dynamic policy. In this paper we present the infrastructure architecture, implementation details and lessons learned.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    2
    References
    0
    Citations
    NaN
    KQI
    []