Storage and Ingestion Systems in Support of Stream Processing: A Survey
2018
Under the pressure of massive, exponentially increasing amounts of
heterogeneous data that are generated faster and faster, Big Data analytics
applications have seen a shift from batch processing to stream processing,
which can reduce the time needed to obtain meaningful insight dramatically.
Stream processing is particularly well suited to address the challenges of fog/edge
computing: much of this massive data comes from Internet of Things (IoT)
devices and needs to be continuously funneled through an edge infrastructure
towards centralized clouds. Thus, it is only natural to process data on their
way as much as possible rather than wait for streams to accumulate on the
cloud. Unfortunately, state-of-the-art stream processing systems are not well
suited for this role: the data are accumulated (ingested), processed and
persisted (stored) separately, often using different services hosted on
different physical machines/clusters. Furthermore, there is only limited support for
advanced data manipulations, which often forces application developers to
introduce custom solutions and workarounds. In this survey article, we
characterize the main state-of-the-art stream storage and ingestion systems.
We identify the key aspects and discuss limitations and missing features in
the context of stream processing for fog/edge and cloud computing. The goal is to
help practitioners understand and prepare for potential bottlenecks when using
such state-of-the-art systems. In particular, we discuss both functional
(partitioning, metadata, search support, message routing, backpressure
support) and non-functional aspects (high availability, durability,
scalability, latency vs. throughput). As a conclusion of our study, we
advocate for a unified stream storage and ingestion system to speed-up data
management and reduce I/O redundancy (both in terms of storage space and
network utilization).
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
0
References
2
Citations
NaN
KQI