Top k Knapsack Joins Work in Progress

2010 
We call knapsack join (K-Join) the join mainly under the condition that the sum of the join attributes is at most some constant. More generally, a K-join can be a θ - join where θ is the usual comparison operator. Our naming originates in the knapsack problem, among most studied in the combinatorial optimization. K-joins appear interesting in practice, especially for the financial community. The constant is typically a budget. The join attributes are possible expenditures, to be chosen from large sets. Usually only the Top k tuples of the K-Join matching best the constant are of interest. We call this calculus the TkK-Join. Current DBMSs calculate K-Joins by the generic nested loop. The result is usually by far too slow in practice. This appears a weakness in the basic relational DBMS design. We propose optimizations making TkK-Join queries practical. They may speed-up a TkK-Join by orders of magnitude with respect to the generic calculus. In finale, we foresee the usual timing in, at most, seconds for tables with thousands of tuples and 6-way TkK-Joins. Using clouds should allow in addition for 4-way TkK-Joins in milliseconds and for 8–way TkK-Joins in dozens of seconds. Further analysis should concern the formal and experimental performance evaluation. Likewise, there are promising variants of our proposals. With all the caution required TkK-Joins seem ready for the mainstream DBMSs.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    5
    References
    0
    Citations
    NaN
    KQI
    []