Comparison of Hive's query optimisation techniques

2018 
The ever increasing size of data sets in this big data era has forced data analytics to be moved from traditional RDBMS systems to distributed technologies like Hadoop. Since data analysts are more familiar with SQL than the MapReduce programming paradigm, HiveQL was built on Hadoop's MapReduce framework. Traditional RDBMS query optimisation techniques used in the rule-based optimiser (RBO) of Hive do not perform well in the MapReduce environment, hence, the correlation optimiser (CRO) and cost-based optimisers (CBOs) were developed. These optimisers perform query optimisations taking the MapReduce execution framework into account. In this work, the three optimisers, RBO, CRO, and CBO are compared. Queries with common intra-query operations are found to be better optimised with CRO.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []