A Performance Comparison of Open-Source Stream Processing Platforms

2016 
Distributed stream processing platforms is a new class of real-time monitoring systems that analyze and extracts knowledge from large continuous streams of data. This type of systems is crucial for providing high throughput and low latency required by Big Data or Internet of Things monitoring applications. This paper describes and analyzes three main open-source distributed stream- processing platforms: Storm Flink, and Spark Streaming. We analyze the system architectures and we compare their main features. We carry out two experiments concerning anomaly detection on network traffic to evaluate the throughput efficiency and the resilience to node failures. Results show that the performance of native stream processing systems, Storm and Flink, is up to 15 times higher than the micro-batch processing system, Spark Streaming. On the other hand, Spark Streaming is more robust to node failures and provides recovery without losses.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    44
    Citations
    NaN
    KQI
    []