Provisioning wide-area virtual environments through i/o interposition: the redirect-on-write file system and characterization of i/o overheads in a virtualized platform

2008 
This dissertation presents the mechanisms to provision and characterize I/O workloads for applications found in virtual data-centers. This dissertation address two specific modes of workload execution in a virtual data center (1) workload execution in heterogeneous compute resources across wide-area environment, and (2) workload execution and characterization within a virtualized platform. A key challenge arising in wide-area, grid computing infrastructures is that of data management—how to provide data to applications, seamlessly, in environments spanning multiple domains. In these environments, it is often the case that data movement and sharing is mediated by middleware that schedules applications. This thesis presents a novel approach that enables wide-area applications to leverage on-demand block-based data transfers and a de-facto distributed file system (NFS) to access data stored remotely and modify it in the local area—Redirect-on-Write file system (ROW-FS). The ROW-FS approach enables multiple clients to operate on private, virtual versions of data mounted from a single shared data served as a network file system (NFS). ROW-FS approach enables multiple VM instances to efficiently share a common set of virtual machine image files. The proposed approach offers savings in storage and bandwidth requirements compared to the conventional approaches of provisioning VMs by copying the entire VM image to the client and by cloning the image on the server side. The Thin client approach described in this dissertation uses ROW-FS to enable the use of unmodified NFS clients/servers and local buffering of file system modifications during an application's lifetime. An important application of ROW-FS is in enabling the instantiation of multiple non-persistent virtual machines across wide-area resources from read-only images stored in an image servers (or distributed along multiple replicas). A common deployment scenario of ROW-FS is when the virtual machine hosting its private, redirected "shadow" file system server and the client virtual machine are consolidated into a single physical machine. While a virtual machine provides levels of execution isolation and service partition that are desirable in environments such as data centers, its associated overheads can be a major impediment for wide deployment of virtualized environments. While the virtualization cost depends heavily on workloads, the overhead is much higher with I/O intensive workloads compared to those which are compute-intensive. Unfortunately, the architectural reasons behind the I/O performance overheads are not well understood. Early research in characterizing these penalties has shown that cache misses and TLB related overheads contribute to most of I/O virtualization cost. While most of these evaluations are done using measurements, this thesis presents an execution-driven simulation based analysis methodology with symbol annotation as a means of evaluating the performance of virtualized workloads, and presents simulation-based characterization of the performance of a representative network-intensive benchmark (iperf) in the Xen virtual machine environment. The main contributions of this dissertation work are: (1) the novel design and implementation of the ROW-FS file system, (2) experimental evaluation of ROW-FS for the O/S image framework that enables virtual machine images to be published, discovered and transferred on-demand through a combination of ROW-FS and peer-to-peer techniques, (3) a novel implementation of an execution-driven simulation framework to evaluate network I/O performance using symbol annotation for environments that encompass both a virtual machine hypervisor and guest operating system domains, and (4) evaluation, through simulation, of the potential benefits of different micro-architectural TLB improvements on performance.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    86
    References
    2
    Citations
    NaN
    KQI
    []