Studying and analyzing on data streams mining technique based on clustering method

Journal of Zhejiang University of Technology (2007)

Citation

Reference

Related Paper

Abstract:

With the development of data gathering and communication technologies,it becomes increasingly possible to support real-time monitoring of large amount of information from diverse information sources.A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate.Due to this reason,traditional data mining approach is replaced by the system that is able to mine continuous,high-volume,open-ended data streams as they arrive.This paper introduces a new algorithm using clustering method to improve the data streams mining technique.We have studied clustering data streams using K-Means algorithm,statistical grid-based algorithm and regression analysis and compared these techniques.

Keywords:

Data stream clustering

Topics:

Advanced Clustering Algorithms Research

Data Stream Mining Techniques

Source

Cite

An algorithm for clustering data streams using incremental DFT

Journal of North China Electric Power University (2007)

Yunfeng Liu

Clustering data streams is one of the important branches in mining data streams.Because of dynamic and massive characteristics of data streams,traditional data mining algorithms could not satisfy the requirement of online analysis.The focus on data stream technologies is to design one-pass scan algorithmover data set,and maintain an effective synopsis data structure(digest) in memory incrementally which is far smaller than the size of whole data set.A novel algorithm for clustering data streams is presented in this paper.In this algorithm,means method is used for the subset division,sliding window model is used for the data changing and updating,DFT digest is used for data reduction and can be incrementally maintained.This algorithm can save main memory and run time,it is suitable for online clustering.Experiment of clustering real electrical consumption data verify the effectiveness of the presented algorithm.

Data stream clustering

Sliding window protocol

Data set

Source

Cite

Citations (0)

Tweet Cluster Analyzer: Partition and Join-based Micro-clustering for Twitter Data Stream

Advances in intelligent systems and computing (2017)

M. Arun Manicka Raja S. Swamynathan

Data stream clustering

Stream Processing

10.1007/978-981-10-3874-7_64

Cite

Citations (1)

An Adaptive Density Data Stream Clustering Algorithm

Cognitive Computation (2015)

Shifei Ding Jian Zhang Hongjie Jia Jun Qian

Data stream clustering

10.1007/s12559-015-9342-z

Cite

Citations (37)

A Kind of Data Stream Clustering Algorithm Based on Grid-Density

Communications in computer and information science (2011)

Zhong Zhishui

Data stream clustering

10.1007/978-3-642-23324-1_67

Cite

Citations (2)

Correlating synchronous and asynchronous data streams

Sudipto Guha Dimitrios Gunopulos Nick Koudas

In a variety of modern mining applications, data are commonly viewed as infinite time ordered data streams rather as finite data sets stored on disk. This view challenges fundamental assumptions commonly made in the context of several data mining algorithms.In this paper, we study the problem of identifying correlations between multiple data streams. In particular, we propose algorithms capable of capturing correlations between multiple continuous data streams in a highly efficient and accurate manner. Our algorithms and techniques are applicable in the case of both synchronous and asynchronous data streaming environments. We capture correlations between multiple streams using the well known technique of Singular Value Decomposition (SVD). Correlations between data items, and the SVD technique in particular, have been repeatedly utilized in an off-line (non stream) data mining problems, for example forecasting, approximate query answering, and data reduction.We propose a methodology based on a combination of dimensionality reduction and sampling to make the SVD technique suitable for a data stream context. Our techniques are approximate, trading accuracy with performance, and we analytically quantify this tradeoff. We present a through experimental evaluation, using both real and synthetic data sets, from a prototype implementation of our technique, investigating the impact of various parameters in the accuracy of the overall computation. Our results indicate, that correlations between multiple data streams can be identified very efficiently and accurately. The algorithms proposed herein, are presented as generic tools, with a multitude of applications on data stream mining problems.

10.1145/956750.956814

Cite

Citations (58)

Clustering algorithm over distributed data stream

Hong Ma

According to the condition that there are some overlap and missing data in distributed data streams,and to meet the needs of lower communication costs,DAM-Distream,a clustering algorithm combining density method and model method is proposed.The al-gorithm uses the Gaussian mixture model to describe the data streams flowing into the local distribution sites.However,Gaussian mixture model parameters are obtained by EM algorithm which is sensitive to initial value.DAM-Distream presents density based algorithm to cluster data streams at first,that is,to search the suitable initial parameters for Gaussian mixture model.Second,EM algorithm is used to iterative clustering,and then the algorithm determines.At last,the models are uploaded to the central site for the integrated treatment.Experimental results show that DAM-Distream can effectively overcome the shortcomings of the EM algorithm and obtain better parame-ters of GMM.Experiment show that it can improve the clustering quality of data streams in distributed systems and reduce the communi-cation cost of the system.

Data stream clustering

Gaussian network model

Cite

Citations (0)

A Density Granularity Grid Clustering Algorithm Based on Data Stream

Communications in computer and information science (2011)

Lifang Wang Xie Han

Granularity

Data stream clustering

10.1007/978-3-642-24273-1_15

Cite

Citations (0)

Clustering Data Streams over Sliding Windows by DCA

Studies in computational intelligence (2013)

Ta Minh Thuy Hoai An Le Thi Lydia Boudjeloud-Assala

Sliding window protocol

Data stream clustering

Data set

10.1007/978-3-319-00293-4_6

Cite

Citations (3)

Research on Parallel Data Stream Clustering Algorithm Based on Grid and Density

Weihua Hu Mingzhong Cheng Guoping Wu Liang Wu

With the emergence of big data and cloud computing, data stream arrives rapidly, large-scale and continuously, real-time data stream clustering analysis has become a hot topic in the study on the current data stream mining. Some existing data stream clustering algorithms cannot effectively deal with the high-dimensional data stream and are incompetent to find clusters of arbitrary shape in real-time, as well as the noise points could not be removed timely. To address these issues, this paper proposes PGDC-Stream, a algorithm based on grid and density for clustering data streams in a parallel distributed environment [4]. The algorithm adopts density threshold function to deal with the noise points and inspect and remove them periodically. It also can find clusters of arbitrary shape in large-scale data flow in real-time. The Map-Reduce framework is used for parallel cluster analysis of data streams.

Data stream clustering

10.1109/csma.2015.21

Cite

Citations (6)

Data Stream clustering using Micro clusters

International journal of advance research and innovative ideas in education (2018)

Jyoti Shantaram Pawar

Data streams are massive, dynamic and unbounded. Due to these issues data stream clustering is challenging problem. Data stream are observed in network monitoring, critical scientific application, weather monitoring and astronomical applications, electronic business, stock trading etc. Data stream clustering puts additional constraints on clustering algorithms. Data streams must be processed in single pass with limited memory as well as with less processing time, but the streams can be highly dynamic. Most of the existing clustering algorithms are distance based and unable to handle the interwoven clusters and also it is impossible to save the data streams, because of infinite characteristic. Proposed work focuses on density based clustering algorithms using micro-clusters. The process is divided into two-phases, online and offline, micro clusters are created in online phase and final clusters are generated in offline phase.

Data stream clustering

Online and offline

Source

Cite

Citations (1)