Monitoring the CMS data acquisition system

2010 
The CMS data acquisition system comprises O(20000) interdependent services that need to be monitored in near real-time. The ability to monitor a large number of distributed applications accurately and effectively is of paramount importance for robust operations. Application monitoring entails the collection of a large number of simple and composed values made available by the software components and hardware devices. A key aspect is that detection of deviations from a specified behaviour is supported in a timely manner, which is a prerequisite in order to take corrective actions efficiently. Given the size and time constraints of the CMS data acquisition system, efficient application monitoring is an interesting research problem. We propose an approach that uses the emerging paradigm of Web-service based eventing systems in combination with hierarchical data collection and load balancing. Scalability and efficiency are achieved by a decentralized architecture, splitting up data collections into regions of collections. An implementation following this scheme is deployed as the monitoring infrastructure of the CMS experiment at the Large Hadron Collider. All services in this distributed data acquisition system are providing standard web service interfaces via XML, SOAP and HTTP [15,22]. Continuing on this path we adopted WS-* standards implementing a monitoring system layered on top of the W3C standards stack. We designed a load-balanced publisher/subscriber system with the ability to include high-speed protocols [10,12] for efficient data transmission [11,13,14] and serving data in multiple data formats.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    11
    Citations
    NaN
    KQI
    []