Research on Parallel Data Stream Clustering Algorithm Based on Grid and Density

Weihua Hu Mingzhong Cheng Guoping Wu Liang Wu

Citation

Reference

Related Paper

Citation Trend

Abstract:

With the emergence of big data and cloud computing, data stream arrives rapidly, large-scale and continuously, real-time data stream clustering analysis has become a hot topic in the study on the current data stream mining. Some existing data stream clustering algorithms cannot effectively deal with the high-dimensional data stream and are incompetent to find clusters of arbitrary shape in real-time, as well as the noise points could not be removed timely. To address these issues, this paper proposes PGDC-Stream, a algorithm based on grid and density for clustering data streams in a parallel distributed environment [4]. The algorithm adopts density threshold function to deal with the noise points and inspect and remove them periodically. It also can find clusters of arbitrary shape in large-scale data flow in real-time. The Map-Reduce framework is used for parallel cluster analysis of data streams.

Keywords:

Data stream clustering

Topics:

Advanced Clustering Algorithms Research

Data Stream Mining Techniques

Anomaly Detection Techniques and Applications

10.1109/csma.2015.21

Cite

A Fuzzy Density-based Incremental Clustering Algorithm

Sirisup Laohakiat Photchanan Ratanajaipan Leenhapat Navaravong Rachanee Ungrangsi Krissada Maleewong

This study presents a density-based incremental clustering algorithm which incorporates the concept of fuzzy set in clustering. Unlike other existing fuzzy clustering algorithms which are c-mean clustering where the number of clusters must be pre-defined, the proposed algorithm incorporates the concept of fuzzy set into density-based clustering where the number of clusters is not restricted. Moreover, the proposed algorithm uses incremental clustering usually employed in stream data clustering, leading to linear computation time, rather than quadratic computation time as in other density-based clustering. The proposed algorithm outperforms other existing density-based clustering algorithms in terms of both clustering results and computation time. As a result, the proposed algorithm can much efficiently process large data sets than other density-based clustering algorithms.

Data stream clustering

FLAME clustering

Single-linkage clustering

Constrained clustering

10.1109/jcsse.2018.8457385

Cite

Citations (1)

High Efficiency Clustering Algorithm for Standard Cell Placement

Dianzi xuebao (2001)

Jun Gu

A high-efficiency clustering algorithm is presented for solving standard-cell style placement problem with very large number of cells.Compared to traditional clustering algorithms,the main feature of the presented algorithm is that an information library including all interconnect relationships between cells is built and well organized before placement,thereby endowing the clustering algorithm with characteristics of global optimality and non-redundancy computation.The clustering algorithm has been applied to a quadratic placement procedure.Experimental results show that our clustering algorithm is very perfect in both clustering quality and clustering speed,thus solves the placement problem with very large number of cells successfully.

Data stream clustering

Constrained clustering

Clustering high-dimensional data

Single-linkage clustering

Source

Cite

Citations (1)

The best clustering algorithms in data mining

K M Archana Patel Prateek Thakral

In data mining, Clustering is the most popular, powerful and commonly used unsupervised learning technique. It is a way of locating similar data objects into clusters based on some similarity. Clustering algorithms can be categorized into seven groups, namely Hierarchical clustering algorithm, Density-based clustering algorithm, Partitioning clustering algorithm, Graph-based algorithm, Grid-based algorithm, Model-based clustering algorithm and Combinational clustering algorithm. These clustering algorithms give different result according to the conditions. Some clustering techniques are better for large data set and some gives good result for finding cluster with arbitrary shapes. This paper is planned to learn and relates various data mining clustering algorithms. Algorithms which are under exploration as follows: K-Means algorithm, K-Medoids, Distributed K-Means clustering algorithm, Hierarchical clustering algorithm, Grid-based Algorithm and Density based clustering algorithm. This paper compared all these clustering algorithms according to the many factors. After comparison of these clustering algorithms I describe that which clustering algorithms should be used in different conditions for getting the best result.

Data stream clustering

Single-linkage clustering

Hierarchical clustering

10.1109/iccsp.2016.7754534

Cite

Citations (59)

An Improved Distributed Clustering Algorithm Based on Density

Jianxiao Chen Yongli Li Peng Sun Minghui Sun Rui Mao

The density based distributed clustering algorithm DBDC has a higher time complexity in the process of distributed clustering. We proposed an improved density based distributed clustering algorithm. This algorithm used a data grid mapping method which mapped data object to the space grid first in the local level to improve the efficiency of the implementation of the local clustering. In the global clustering level of the new algorithm, we proposed a global clustering method based on representative points intersection and uses the central point of representative point to reduce the clustering error. Experimental results showed that the proposed improved density-based distributed clustering algorithm was more accurate than DBDC.

Data stream clustering

FLAME clustering

DBSCAN

k-medians clustering

10.1109/icinis.2015.58

Cite

Citations (1)

An incremental density-based clustering framework using fuzzy local clustering

Information Sciences (2020)

Sirisup Laohakiat Vera Sa‐ing

Data stream clustering

Single-linkage clustering

FLAME clustering

Constrained clustering

Clustering high-dimensional data

10.1016/j.ins.2020.08.052

Cite

Citations (45)

A Grid and MST Based Clustering Algorithm for Data Streams

Computer Systems and Applications (2011)

Hai Wang

CluStream algorithm has poor quality of clustering for non-spherical clusters,at the same time,most grid-based clustering algorithms improve the efficiency of clustering at the cost of reducing clustering accuracy.The paper gives a new kind of clustering algorithm for data stream—GTSClu,it is the minimum spanning tree data stream clustering algorithm based on grid,which is divided into online processing and offline clustering,combining with grid resolution and minimum spanning tree techniques.GTSClu algorithm cannot only find clusters with arbitrary shape and amount,but also deal with noise data effectively,the efficiency and quality of clustering is improved.

Data stream clustering

Single-linkage clustering

Source

Cite

Citations (0)

A Novel Algorithm for Adaptive Data Stream Clustering

Farnaz Ansarifar Ali Ahmadi

In recent years, processing and management of data streams has become a topic of active research in several fields of computer science. A data stream is continuously increasing sequence of time stamped data. There are various applications in which data streams are produced such as network monitoring, telecommunication systems, stock markets, customer click streams or any type of multi-sensor system. Due to large number of data stream applications, its clustering has become an important technique in data mining and knowledge discovery. STREAM is a data stream clustering algorithm which divides data into chunks, cluster the chunks and, then, again cluster the obtained centers. An important constraint of STREAM is inadaptability with evolving data stream. Particularly it is not sensitive to evolution of the underlying data stream. In many cases, the patterns in the underlying stream may evolve and change significantly. Therefore, it is critical for the clustering process to be adaptable with such changes and provide insights over different time horizons. In this paper we have proposed an improved STREAM clustering method which retains the STREAM algorithm adaptive to drifts by adjusting itself, as the data stream changes.

Data stream clustering

Stream Processing

10.1109/icee.2018.8472649

Cite

Citations (5)

An effective and efficient grid-based data clustering algorithm using intuitive neighbor relationship for data mining

Cheng-Fa Tsai Sheng-Chiang Huang

This paper presents a new data clustering technique. It is a new grid-based clustering scheme by intuitive neighbor relationship for enhancing data clustering performance. Compared to other algorithms, this improved grid-based clustering algorithm substantially decreases repetitive clustering checks of neighboring grids and greatly improve the efficiency of data processing. Our simulations demonstrate that the proposed data clustering technique delivers better performance, in terms of clustering correctness rate and noise filtering rate, than perform other well-known existing algorithms, GOD-CS, CLIQUE and TING. To our best knowledge, the proposed data clustering technique may be the rapid method in the world currently.

Data stream clustering

Affinity propagation

10.1109/icmlc.2015.7340603

Cite

Citations (11)

Novel clustering algorithm based on grid and density

Jisuanji yingyong yanjiu (2011)

Shiyong Xiong

In view of the efficiency and quality issues existed in both the grid and density clustering algorithms,this paper proposed the combination of density and grid clustering algorithm,that was DGCA(density and grid based clustering algorithm) which based on density and grid.The given algorithm firstly divided data space into grids;followed by storing data into the grid cell,and used DBSCAN to conduct clustering mining;finally,it carried on clustering merging and elimination of noise points,and maps the local clustering results to the global clustering results.The experiment is theoretically varified with artificial data set on this clustering algorithm,and shows that the algorithm gained enhance on both time efficiency and clustering quality.

DBSCAN

Data stream clustering

FLAME clustering

Source

Cite

Citations (0)