Toward Sliding Time Window of Low Watermark to Detect Delayed Stream Arrival

2021 
Some emergency events such as time interval between input streams, operator’s misoperation, and network delay might cause stream processing system produce unbounded out-of-order data streams. Recent work on this issue focuses on explicit punctuation or heartbeats to handle faults and stragglers (outlier data). Most parallel and distributed models on stream processing, such as Google MillWheel and Apache Flink, require hot replication, logging, and upstream backup in an expensive manner. But these frameworks ignore straggler processing. Some latest frameworks such as Google MillWheel and Apache Flink only process disorder on an operator level, but only point-in-time and fixed window of low watermarks are discussed. Therefore, we propose a new sliding time window of low watermarks to detect delayed stream arrival. Contributions of our methods conclude as adaptive low watermarks, distinguishing stragglers from late data, and dynamic rectification of low watermark. The experiments show that our method is better in tolerating more late data to detect stragglers accurately.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    0
    Citations
    NaN
    KQI
    []