A Method of Data Distribution for Distributed Cross Join

2013 
One of the major challenges in big data processing is the efficiency of cross join, such as the similarity calculation in business intelligence. In this paper we introduce an optimal data distribution algorithm for distributed cross join which combine each row from the first table with each row from the second table, which can reduce the network traffic and guarantee the computation balance of the distributed system.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    0
    Citations
    NaN
    KQI
    []