Popularity framework to process dataset traces and its application on dynamic replica reduction in the ATLAS experiment

2011 
The ATLAS experiment's data management system is constantly tracing file movement operations that occur on the Worldwide LHC Computing Grid (WLCG). Due to the large scale of the WLCG, statistical analysis of the traces is infeasible in real-time. Factors that contribute to the scalability problems include the capability for users to initiate on-demand queries, high dimensionality of tracer entries combined with very low cardinality parameters, and the large size of the namespace. These scalability issues are alleviated through the adoption of an incremental model that aggregates data for all combinations occurring in selected tracer fields on a daily basis. Using this model it is possible to query on-demand relevant statistics about system usage. We present an implementation of this popularity model in the experiment's distributed data management system, DQ2, and describe a direct application example of the popularity framework, an automated cleaning system, which uses the statistics to dynamically detect and reduce unpopular replicas from grid sites. This paper describes the architecture employed by the cleaning system and reports on the results collected from a prototype during the first months of the ATLAS collision data taking.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    1
    References
    11
    Citations
    NaN
    KQI
    []