RepBun: Load-Balanced, Shuffle-Free Cluster Caching for Structured Data

2020 
Cluster caching systems increasingly store structured data objects in the columnar format. However, these systems routinely face the imbalanced load that significantly impairs the I/O performance. Existing load-balancing solutions, while effective for reading unstructured data objects, fall short in handling columnar data. Unlike unstructured data that can only be read through a full-object scan, columnar data supports direct query of specific columns with two distinct access patterns: (1) columns have the heavily skewed popularity, and (2) hot columns are likely accessed together in a query job. Based on these two access patterns, we propose an effective load-balancing solution for structured data. Our solution, which we call RepBun, groups hot columns into a bundle. It then copies multiple replicas of the column bundle and stores them uniformly across servers. We show that RepBun achieves improved load balancing with reduced memory overhead, while avoiding data shuffling between cache servers. We implemented RepBun atop Alluxio, a popular in-memory distributed storage, and evaluate its performance through EC2 deployment against the TPC-H benchmark work-load. Experimental results show that RepBun outperforms the existing load-balancing solutions with significantly shorter read latency and faster query completion.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    0
    Citations
    NaN
    KQI
    []