A Popularity-Based Prediction and Data Redistribution Tool for ATLAS Distributed Data Management

T. A. Beermann,G. A. Stewart,Peter Maettig

A Popularity-Based Prediction and Data Redistribution Tool for ATLAS Distributed Data Management

2014

This paper presents a system to predict future data popularity for data-intensive systems, such as the ATLAS distributed data management (DDM). Using these predictions it is possible to improve the distribution of data, helping to reduce waiting times for jobs using this data. This system is based on a tracer infrastructure that is able to monitor and store historical data accesses, which is then used to create popularity reports. These reports provide a summary of data accesses in the past, including information about the accessed files, the involved users and the sites. From this past accesses information it is possible to make near-term forecasts of data popularity. The prediction system introduced in this paper makes use of both simple prediction methods, as well as predictions made by neural networks. The best prediction method is dependent on the type of data and the access information is carefully filtered for use in either system. The second part of the paper introduces a system that effectively places data based on the predictions. This is a two phase process: In the first phase space is freed by removing unpopular replicas; in the second new replicas for popular datasets are created. The process of creating new replicas is limited by certain constraints: there is only a limited amount of space available and the creation of replicas involve transfers that use bandwidth. Furthermore, the benefits of each replica is different. The goal is to maximise the global benefit while respecting the constraints. The final part shows the evaluation of this method using a grid simulator. The simulator is able to replay workload on different data distributions while measuring the job waiting time. We show how job waiting time can be reduced based on accurate predictions about future accesses.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations