Skew-insensitive Parallel Algorithms for Relational Join

Khaled Alsabti,Sanjay Ranka

Skew-insensitive Parallel Algorithms for Relational Join

2001

Khaled Alsabti
Sanjay Ranka

Join is the most important and expensive operation in relational databases. The parallel join operation is very sensitive to the presence of the data skew. In this paper, we present two new parallel join algorithms for coarse-grained machines, which work optimally in presence of arbitrary amount of data skew. The first algorithm is sort-based and the second is hash-based. Both of these algorithms employ a preprocessing phase (prior to the redistribution phase) to equally partition the work among the processors. These algorithms are shown to be theoretically as well as practically scalable. Experimental results are provided on the IBM SP-2.

Keywords:

Recursive join
Hash join
Skew
Block nested loop
Sort-merge join
Parallel computing
Relational database
Parallel algorithm
Scalability
Distributed computing
Computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations