Concept-Driven Load Shedding: Reducing Size and Error of Voluminous and Variable Data Streams

2018 
Load shedding is a technique that aims to ameliorate the consequences of the Velocity and the Volume of Big Data stream processing. When temporal input spikes appear, tuples are shed until a Stream Processing Engine’s (SPE) processing capacity is not overwhelmed and results are produced in a timely fashion. Existing load shedding techniques have become obsolete and are not applicable to modern use-cases which require the extraction of patterns from continuously evolving (i.e., Variable) voluminous streams.In this work, we identify the shortcomings of existing load shedding techniques when applied to streams with concept drift. We propose Concept-Driven load shedding (CoD), which aims at limiting the data volume imposed on the SPE while producing high accuracy results. On top of that, we designed CoD for modern SPEs and made its overhead negligible. Our experiments indicate that CoD can deliver more than 10x more accurate results compared to the state of the art in load shedding. Also, CoD can offer up to 2.25× better performance compared to normal processing and reduce the processed data volume significantly.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    47
    References
    7
    Citations
    NaN
    KQI
    []