PROADAPT: Proactive framework for adaptive partitioning for big data warehouses

2022 
Parallel DBMSs have become more and more mature and getting several success stories in the industry. This situation has been reached by powerful data partitioning and data allocation techniques and algorithms. By analyzing these findings closely, we figure out that they are quickly stressed by the workload changes, which represents the usual case of business analytics applications. To deal with this challenge in the context of big data warehouses, several studies proposed to move to another processing paradigm outside the DBMS realm such as Spark by proposing adaptive partitioning solutions to tackle the workload changes. The majority of approaches are offline and those that are online cause significant random disk I/O costs. This is because the correlation that may exist between jobs and data blocks read from the disk is not captured to refine the adaptive partitioning algorithms. This represents one of the major causes of providing high performance of dynamic workloads. To solve such limitations, we propose in this paper a proactive framework for query-aware adaptive partitioning (called PROADAPT) that uses an AI-inspired methodology that can be connected to any query optimizer managing partitioned data. This methodology uses a genetic algorithm to solve our formalized problem that considers the interaction that may exist among workload queries. PROADAPT intensively rewrites queries by exploiting dimension hierarchies to skip irrelevant data and then improves I/O performance. Different technical modules of our framework are discussed. Finally, we conduct intensive experiments on Postgres-XL and a Spark SQL parallel cluster to show the effectiveness and efficiency of our approach.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []