A lightweight high availability strategy for Atlas LCG File Catalogs

2010 
The LCG File Catalog is a key component of the LHC Computing Grid middleware [1], as it contains the mapping between Logical File Names and Physical File Names on the Grid. The Atlas computing model foresees multiple local LFC housed in each Tier-1 and Tier-0, containing all information about files stored in the regional cloud. As the local LFC contents are presently not replicated anywhere, this turns out in a dangerous single point of failure for all of the Atlas regional clouds. In order to solve this problem we propose a novel solution for high availability (HA) of Oracle based Grid services, obtained by composing an Oracle Data Guard deployment and a series of application level scripts. This approach has the advantage of being very easy to deploy and maintain, and represents a good candidate solution for all Tier-2s which are usually little centres with little manpower dedicated to service operations. We also present the results of a wide range of functionality and performance tests run on a test-bed having characteristics similar to the ones required for production. The test-bed consists of a failover deployment between the Italian LHC Tier-1 (INFN – CNAF) and an Atlas Tier-2 located at INFN – Roma1. Moreover, we explain how the proposed strategy can be deployed on the present Grid infrastructure, without requiring any change to the middleware and in a way that is totally transparent to end users and applications.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    3
    References
    1
    Citations
    NaN
    KQI
    []