Managing Large Data Productions in LHCb

2009 
LHC experiments are producing very large volumes of data either accumulated from the detectors or generated via the Monte-Carlo modeling. The data should be processed as quickly as possible to provide users with the input for their analysis. Processing of multiple hundreds of terabytes of data necessitates generation, submission and following a huge number of grid jobs running all over the Computing Grid. Manipulation of these large and complex workloads is impossible without powerful production management tools. In LHCb, the DIRAC Production Management System (PMS) is used to accomplish this task. It enables production managers and end-users to deal with all kinds of data generation, processing and storage. Application workflow tools allow to define jobs as complex sequences of elementary application steps expressed as Directed Acyclic Graphs. Specialized databases and a number of dedicated software agents ensure automated data driven job creation and submission. The productions are accomplished by thorough checks of the resulting data integrity. With the PMS a complete user interface is provided for operations starting from requests generated by the user community till the task completion and bookkeeping. Both command line and a full featured Web based Graphical User Interface allows to perform all the tasks of the production definition, control and monitoring. This facilitates the job of the production managers allowing a single person to steer all the LHCb production activities. In the paper we will provide a detailed description of the DIRAC PMS components, their interactions with the other DIRAC subsystems. The experience with real large-scale productions will be presented and further evolution of the system will be discussed.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []