Investigating Automatic Parameter Tuning for SQL-on-Hadoop Systems

Edson Ramiro Lucas Filho,Eduardo Cunha de Almeida,Stefanie Scherzinger,Herodotos Herodotou

Investigating Automatic Parameter Tuning for SQL-on-Hadoop Systems

2021

Abstract SQL-on-Hadoop engines such as Hive provide a declarative interface for processing large-scale data over computing frameworks such as Hadoop. The underlying frameworks contain a large number of configuration parameters that can significantly impact performance, but which are hard to tune. The problem of automatic parameter tuning has become a lively research area and several sophisticated tuning advisors have been proposed for Hadoop. In this paper, we conduct an experimental study to explore the impact of Hadoop parameter tuning on Hive. We reveal that the performance of Hive queries does not necessarily improve when using Hadoop-focused tuning advisors out-of-the-box, at least when following the current approach of applying the same tuning setup uniformly for evaluating the entire query. After extending the Hive query processing engine, we propose an alternative tuning approach and experimentally show how current Hadoop tuning advisors can now provide good and robust performance for Hive queries, as well as improved cluster resource utilization. We share our observations with the community and hope to create an awareness for this problem as well as to initiate new research on automatic parameter tuning for SQL-on-Hadoop systems.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations