Investigating Automatic Parameter Tuning for SQL-on-Hadoop Systems

2021 
Abstract SQL-on-Hadoop engines such as Hive provide a declarative interface for processing large-scale data over computing frameworks such as Hadoop. The underlying frameworks contain a large number of configuration parameters that can significantly impact performance, but which are hard to tune. The problem of automatic parameter tuning has become a lively research area and several sophisticated tuning advisors have been proposed for Hadoop. In this paper, we conduct an experimental study to explore the impact of Hadoop parameter tuning on Hive. We reveal that the performance of Hive queries does not necessarily improve when using Hadoop-focused tuning advisors out-of-the-box, at least when following the current approach of applying the same tuning setup uniformly for evaluating the entire query. After extending the Hive query processing engine, we propose an alternative tuning approach and experimentally show how current Hadoop tuning advisors can now provide good and robust performance for Hive queries, as well as improved cluster resource utilization. We share our observations with the community and hope to create an awareness for this problem as well as to initiate new research on automatic parameter tuning for SQL-on-Hadoop systems.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    56
    References
    1
    Citations
    NaN
    KQI
    []