A Flexible GridFTP Client for Scheduling of Big Data Transfers

Esma Yildirim

A Flexible GridFTP Client for Scheduling of Big Data Transfers

2013

Esma Yildirim

Big Data generated in massive amounts by digital sources ranging from scientific instruments, business transactions to the social networks, has changed the way we understand and handle data. It has caused scientific and business community, as well as governments to focus on urgent technologies and policies to provide novel tools for management, analysis, access and scheduling of Big Data. These tools have to be flexible and scalable enough to be able to manage data in exa-scale with the help of data centers that can hold thousands of compute and storage nodes interconnected with high speed networks. In this study, we target the performance improvement that might have been achieved from scheduling of big data transfers and provide a flexible client based on a very widely adopted and acclaimed protocol GridFTP. The latest client provided by the Globus Toolkit project does not answer to the needs of highly intelligent optimized data transfer algorithms. With this flexible client, developers can implement various kinds of scheduling algorithms as well as apply optimization techniques like pipelining, parallelism and concurrency in much less restricted use cases. The ability to enqueue, dequeue, combine, sort and divide data transfers into groups helped apply these techniques easily resulting in performance improvements in terms of throughput in high-speed networks. The client was used to implement two different algorithms which were able to exploit its abilities and provided performance improvements in both cases comparing to baseline GridFTP and optimized UDT results.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations