On the Organization of Cluster Voting with Massive Distributed Streams

2014 
Data processing is one of the important challenges on Big Data. In this paper we investigate optimal processing algorithm for massive data streams, propose a new processing algorithm called multi-buffer based majority algorithm. The algorithm maintains time complexity of O(n) and selects prevalent elements of frequencies as low as 1%. Our experiments indicate that multi-buffer based majority algorithm has improvements on both accuracy and efficiency. Moreover, we use multibuffer based algorithm to process data streams on single system and distributed system. These experiments indicate that using multi-buffer based algorithm can have better performance on distributed system. Moreover, we give explanations of the experiments' result and indicate several major factors which influence the result accuracy: stream size, element range in the stream, frequency of predominant elements and our buffer sets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    6
    Citations
    NaN
    KQI
    []