Groupement de clés efficace pour un équilibrage de charge quasi-optimal dans les systèmes de traitement de flux

2016 
Key grouping is a technique used by stream processing frame- works to simplify the development of parallel stateful opera- tors. Through key grouping a stream of tuples is partitioned in several disjoint sub-streams depending on the values con- tained in the tuples themselves. Each operator instance tar- get of one sub-stream is guaranteed to receive all the tuples containing a specific key value. A common solution to imple- ment key grouping is through hash functions that, however, are known to cause load imbalances on the target operator instances when the input data stream is characterized by a skewed value distribution. In this paper we present DKG, a novel approach to key grouping that provides near-optimal load distribution for input streams with skewed value distri- bution. DKG starts from the simple observation that with such inputs the load balance is strongly driven by the most frequent values; it identifies such values and explicitly maps them to sub-streams together with groups of less frequent items to achieve a near-optimal load balance. We provide theoretical approximation bounds for the quality of the map- ping derived by DKG and show, through both simulations and a running prototype, its impact on stream processing applications.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []