Research on Parallel Data Stream Clustering Algorithm Based on Grid and Density
6
Citation
2
Reference
10
Related Paper
Citation Trend
Abstract:
With the emergence of big data and cloud computing, data stream arrives rapidly, large-scale and continuously, real-time data stream clustering analysis has become a hot topic in the study on the current data stream mining. Some existing data stream clustering algorithms cannot effectively deal with the high-dimensional data stream and are incompetent to find clusters of arbitrary shape in real-time, as well as the noise points could not be removed timely. To address these issues, this paper proposes PGDC-Stream, a algorithm based on grid and density for clustering data streams in a parallel distributed environment [4]. The algorithm adopts density threshold function to deal with the noise points and inspect and remove them periodically. It also can find clusters of arbitrary shape in large-scale data flow in real-time. The Map-Reduce framework is used for parallel cluster analysis of data streams.Keywords:
Data stream clustering
This study presents a density-based incremental clustering algorithm which incorporates the concept of fuzzy set in clustering. Unlike other existing fuzzy clustering algorithms which are c-mean clustering where the number of clusters must be pre-defined, the proposed algorithm incorporates the concept of fuzzy set into density-based clustering where the number of clusters is not restricted. Moreover, the proposed algorithm uses incremental clustering usually employed in stream data clustering, leading to linear computation time, rather than quadratic computation time as in other density-based clustering. The proposed algorithm outperforms other existing density-based clustering algorithms in terms of both clustering results and computation time. As a result, the proposed algorithm can much efficiently process large data sets than other density-based clustering algorithms.
Data stream clustering
FLAME clustering
Single-linkage clustering
Constrained clustering
Cite
Citations (1)
A high-efficiency clustering algorithm is presented for solving standard-cell style placement problem with very large number of cells.Compared to traditional clustering algorithms,the main feature of the presented algorithm is that an information library including all interconnect relationships between cells is built and well organized before placement,thereby endowing the clustering algorithm with characteristics of global optimality and non-redundancy computation.The clustering algorithm has been applied to a quadratic placement procedure.Experimental results show that our clustering algorithm is very perfect in both clustering quality and clustering speed,thus solves the placement problem with very large number of cells successfully.
Data stream clustering
Constrained clustering
Clustering high-dimensional data
Single-linkage clustering
Cite
Citations (1)
In data mining, Clustering is the most popular, powerful and commonly used unsupervised learning technique. It is a way of locating similar data objects into clusters based on some similarity. Clustering algorithms can be categorized into seven groups, namely Hierarchical clustering algorithm, Density-based clustering algorithm, Partitioning clustering algorithm, Graph-based algorithm, Grid-based algorithm, Model-based clustering algorithm and Combinational clustering algorithm. These clustering algorithms give different result according to the conditions. Some clustering techniques are better for large data set and some gives good result for finding cluster with arbitrary shapes. This paper is planned to learn and relates various data mining clustering algorithms. Algorithms which are under exploration as follows: K-Means algorithm, K-Medoids, Distributed K-Means clustering algorithm, Hierarchical clustering algorithm, Grid-based Algorithm and Density based clustering algorithm. This paper compared all these clustering algorithms according to the many factors. After comparison of these clustering algorithms I describe that which clustering algorithms should be used in different conditions for getting the best result.
Data stream clustering
Single-linkage clustering
Hierarchical clustering
Cite
Citations (59)
The density based distributed clustering algorithm DBDC has a higher time complexity in the process of distributed clustering. We proposed an improved density based distributed clustering algorithm. This algorithm used a data grid mapping method which mapped data object to the space grid first in the local level to improve the efficiency of the implementation of the local clustering. In the global clustering level of the new algorithm, we proposed a global clustering method based on representative points intersection and uses the central point of representative point to reduce the clustering error. Experimental results showed that the proposed improved density-based distributed clustering algorithm was more accurate than DBDC.
Data stream clustering
FLAME clustering
DBSCAN
k-medians clustering
Cite
Citations (1)
Data stream clustering
Single-linkage clustering
FLAME clustering
Constrained clustering
Clustering high-dimensional data
Cite
Citations (45)
CluStream algorithm has poor quality of clustering for non-spherical clusters,at the same time,most grid-based clustering algorithms improve the efficiency of clustering at the cost of reducing clustering accuracy.The paper gives a new kind of clustering algorithm for data stream—GTSClu,it is the minimum spanning tree data stream clustering algorithm based on grid,which is divided into online processing and offline clustering,combining with grid resolution and minimum spanning tree techniques.GTSClu algorithm cannot only find clusters with arbitrary shape and amount,but also deal with noise data effectively,the efficiency and quality of clustering is improved.
Data stream clustering
Single-linkage clustering
Cite
Citations (0)
In recent years, processing and management of data streams has become a topic of active research in several fields of computer science. A data stream is continuously increasing sequence of time stamped data. There are various applications in which data streams are produced such as network monitoring, telecommunication systems, stock markets, customer click streams or any type of multi-sensor system. Due to large number of data stream applications, its clustering has become an important technique in data mining and knowledge discovery. STREAM is a data stream clustering algorithm which divides data into chunks, cluster the chunks and, then, again cluster the obtained centers. An important constraint of STREAM is inadaptability with evolving data stream. Particularly it is not sensitive to evolution of the underlying data stream. In many cases, the patterns in the underlying stream may evolve and change significantly. Therefore, it is critical for the clustering process to be adaptable with such changes and provide insights over different time horizons. In this paper we have proposed an improved STREAM clustering method which retains the STREAM algorithm adaptive to drifts by adjusting itself, as the data stream changes.
Data stream clustering
Stream Processing
Cite
Citations (5)
This paper presents a new data clustering technique. It is a new grid-based clustering scheme by intuitive neighbor relationship for enhancing data clustering performance. Compared to other algorithms, this improved grid-based clustering algorithm substantially decreases repetitive clustering checks of neighboring grids and greatly improve the efficiency of data processing. Our simulations demonstrate that the proposed data clustering technique delivers better performance, in terms of clustering correctness rate and noise filtering rate, than perform other well-known existing algorithms, GOD-CS, CLIQUE and TING. To our best knowledge, the proposed data clustering technique may be the rapid method in the world currently.
Data stream clustering
Affinity propagation
Cite
Citations (11)
In view of the efficiency and quality issues existed in both the grid and density clustering algorithms,this paper proposed the combination of density and grid clustering algorithm,that was DGCA(density and grid based clustering algorithm) which based on density and grid.The given algorithm firstly divided data space into grids;followed by storing data into the grid cell,and used DBSCAN to conduct clustering mining;finally,it carried on clustering merging and elimination of noise points,and maps the local clustering results to the global clustering results.The experiment is theoretically varified with artificial data set on this clustering algorithm,and shows that the algorithm gained enhance on both time efficiency and clustering quality.
DBSCAN
Data stream clustering
FLAME clustering
Cite
Citations (0)