DCS: A Policy Framework for the Detection of Correlated Data Streams

2019 
There is an increasing demand for real-time analysis of large volumes of data streams that are produced at high velocity. The most recent data needs to be processed within a specified delay target in order for the analysis to lead to actionable result. To this end, in this paper, we present an effective solution for detecting the correlation of such data streams within a micro-batch of a fixed time interval. Our solution, coined DCS, for Detection of Correlated Data Streams, combines (1) incremental sliding-window computation of aggregates, to avoid unnecessary re-computations, (2) intelligent scheduling of computation steps and operations, driven by a utility function within a micro-batch, and (3) an exploration policy that tunes the utility function. Specifically, we propose nine policies that explore correlated pairs of live data streams across consecutive micro-batches. Our experimental evaluation on a real world dataset shows that some policies are more suitable to identifying high numbers of correlated pairs of live data streams, already known from previous micro-batches, while others are more suitable to identifying previously unseen pairs of live data streams across consecutive micro-batches.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    1
    Citations
    NaN
    KQI
    []