Sampling the Join of Streams
2010
One of the most critical operators for a Data Stream Management System is the join operator. Unfortunately, the join operator between the stream A and B is a blocking operator: for each current tuple of the stream A, the entire stream B have to be scanned. The usual technique used for unblocking stream operators consists to restrict the processing to a sliding window. This technique emphasizes recent data which are considered to be more relevant than old data. However, in a Data Stream Management System, a general approach is needed to join any data streams for any applications. Our approach is to consider data stream join as an estimation problem. The estimation model is simple and generic: a reservoir per data stream is used to model the join. The quality of join estimator is based on the frequencies of join key in the join. We propose four algorithms to feed reservoirs. The proposed methods outperform reservoir sampling approach on synthetic and real data streams.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
11
References
1
Citations
NaN
KQI