Skew-insensitive Parallel Algorithms for Relational Join

2001 
Join is the most important and expensive operation in relational databases. The parallel join operation is very sensitive to the presence of the data skew. In this paper, we present two new parallel join algorithms for coarse-grained machines, which work optimally in presence of arbitrary amount of data skew. The first algorithm is sort-based and the second is hash-based. Both of these algorithms employ a preprocessing phase (prior to the redistribution phase) to equally partition the work among the processors. These algorithms are shown to be theoretically as well as practically scalable. Experimental results are provided on the IBM SP-2.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    26
    References
    0
    Citations
    NaN
    KQI
    []