An Online Incremental Clustering Framework for Real-Time Stream Analytics

2019 
With the evolution of data acquisition methods, our ability to collect real time data has increased. This requires the development of real-time analytics, using the most recent data to generate valuable insights. One example is customer profiling, where we want to identify groups of similar clients who were active recently, and improve the quality of the suggestions. Traditional clustering algorithms perform well on finite datasets, but their execution is often not compatible with real-time requirements, especially for rapid changing trends. In this context, we propose a novel approach for the definition of incremental clustering algorithms to work within real-time constraints, in an online fashion, while preserving accuracy. We show the general applicability of the framework by employing this method to three different clustering algorithms. We compare the experimental results between traditional and online approaches evaluating accuracy and computational cost. The results show that algorithms executed in our framework are comparable to their offline implementation in terms of accuracy and with a high gain in execution time, up to three orders of magnitude on average.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    0
    Citations
    NaN
    KQI
    []