On the Organization of Cluster Voting with Massive Distributed Streams

Adi Alhudhaif,Tong Yan,Simon Y. Berkovich

On the Organization of Cluster Voting with Massive Distributed Streams

2014

Data processing is one of the important challenges on Big Data. In this paper we investigate optimal processing algorithm for massive data streams, propose a new processing algorithm called multi-buffer based majority algorithm. The algorithm maintains time complexity of O(n) and selects prevalent elements of frequencies as low as 1%. Our experiments indicate that multi-buffer based majority algorithm has improvements on both accuracy and efficiency. Moreover, we use multibuffer based algorithm to process data streams on single system and distributed system. These experiments indicate that using multi-buffer based algorithm can have better performance on distributed system. Moreover, we give explanations of the experiments' result and indicate several major factors which influence the result accuracy: stream size, element range in the stream, frequency of predominant elements and our buffer sets.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations