A Bilayer Daily Load Curve Clustering Method Based on Short-Time Cross-Correlation

2021 IEEE 5th Conference on Energy Internet and Energy System Integration (EI2) (2021)

Chen Chen Laijun Chen Hengrui Ma Xiaodai Xue

Citation

Reference

Related Paper

Abstract:

The diversity of residential consumer load curves in the shape characteristic, amplitude and time domain increase the difficulties of clustering analysis. To solve this problem, this paper proposes a bilayer clustering method for daily load curves based on short-time cross-correlation. Firstly, to achieve better clustering of load curves, the short-time cross-correlation coefficient is used as the distance metric to weaken the effect of time domain differences on shape similarity. Then, the shape clustering results are clustered on amplitude by taking the Euclidean distance as a distance metric. The proposed method not only considers the influence of temporal characteristics on the distance metric, but also has the ability to identify the shape and amplitude of daily load curves. Case study on the measured data, verified the effectiveness of the proposed method in load pattern recognition.

Keywords:

Similarity (geometry)

k-medians clustering

Topics:

Time Series Analysis and Forecasting

Anomaly Detection Techniques and Applications

Energy Load and Power Forecasting

10.1109/ei252483.2021.9712923

Cite

A comparative study and performance evaluation of similarity measures for data clustering

Dauda Usman Ismail Mohamad

Clustering is a useful technique that organizes a large quantity of unordered datasets into a small number of meaningful and coherent clusters. A wide variety of distance functions and similarity measures have been used for clustering, such as squared Euclidean distance, Manhattan distance and relative entropy. In this paper, we compare and analyze the effectiveness of these measures in clustering for high dimensional datasets. Our experiments utilize the basic K-means algorithm with application of PCA and we report results on simulated high dimensional datasets and two distance/similarity measures that have been most commonly used in clustering. The analyzed results indicate that Squared Euclidean distance is much better than the Manhattan distance method.

Distance measures

Similarity (geometry)

k-medians clustering

Complete-linkage clustering

Minkowski distance

Single-linkage clustering

Consensus clustering

Source

Cite

Citations (0)

The Selection Algorithm of an Improved k-means Initial Clustering Centers

Journal of Shanxi Normal University (2013)

AN Ai-fen

Aiming at the problems of the slow convergence rate and the poor effect of clustering for the traditional k-means clustering method in random selecting initial cluster centers,an improved selection method of the k-means initial clustering centers combining the spatial similarity measure is presented.By defining the similarty of spatial sample,and selecting the sample of minor similarity as intital clustering center,the iterations for clustering steady are decreased,and the efficiency of clustering algorithm is improved.The results on UCI data sets show that the selection algorithm of improved k-means initial clustering centers speeds up the convergence speed,and obtaines a excellent clustering results comparing with the traditional k-means of clustering mean.

Single-linkage clustering

k-medians clustering

Similarity (geometry)

Clustering high-dimensional data

Data stream clustering

Source

Cite

Citations (0)

An improved measure for data clustering in high dimensional space

Snehalika Lall Rimita Lahiri Amit Konar Sanchita Ghosh

The k-means clustering fails to correctly cluster the data points in high dimensional space, primarily for employing Euclidean norm as the distance metric. The Euclidean metric increases with the increase in data dimension, thus posing difficulty to segregate intra-cluster and inter-cluster data points. Adoption of k-means clustering, realized with Euclidean distance norm, often misguides the selection of cluster centres in a given iteration. This paper proposes a novel approach to k-means clustering algorithm by replacing the Euclidean distance metric by a new one. The merit of the proposed metric lies in keeping the distance low, even for large dimensional data points. The new metric enables the algorithm to correctly select the cluster centres over the iterations. Experiments undertaken revealed that the said distance metric based k-means clustering outperforms the traditional one by a large margin.

k-medians clustering

Hierarchical clustering

Margin (machine learning)

Data point

10.1109/microcom.2016.7522565

Cite

Citations (2)

FCM using squared euclidean distance for e-commerce classification in Indonesia

Journal of Physics Conference Series (2020)

E Z Khulaidah Nursyiva Irsalinda

Abstract Clustering is a method of grouping data into several clusters so that the data in one cluster has a high level of similarity while the data between other clusters have a low level of similarity. One method used in clustering is Fuzzy C-Means (FCM) which is a data clustering technique in which the existence of each data point in a cluster is determined by the degree of membership in each cluster. The FCM algorithm has an objective function that requires distance. The distance used in this study is Squared Euclidean distance. The clustering conducted is the clustering of the popularity of e-commerce in Indonesia in 2019 using the variable average number of monthly visitors, number of website visitors, number of social media followers (Twitter, Instagram, and Facebook) as well as the number of workers. The result of this method is the level of popularity of e-commerce in Indonesia, which is divided into gold, silver, and bronze. Clustering results were tested with the Partition Entropy Index (PEI) and Classification Entropy (CE) if the results are getting closer to 0, the results are getting better. The result of PEI is 2.9697e-0, and CE is 2.5710e-04. So, based on the two indexes It can be concluded that FCM using Squared Euclidean distance is good to clustering.

k-medians clustering

Complete-linkage clustering

Popularity

Similarity (geometry)

Single-linkage clustering

10.1088/1742-6596/1613/1/012071

Cite

Citations (1)

Improved K-means Based on Density Parameters and Normalized Distance

Xing Che Hengyi Tao ZiHan Shi

The existing K-means has problems such as random selection of initial cluster centers, sensitivity to outliers, and inability to unify the data size. In this paper we describe a new improved K-means based on density parameters and normalized distance (K-DPND) we developed that addresses these problems. In the stage of selecting the initial cluster centers, K-DPND constructs a density parameter set according to the distance matrix and the average distance of the data set, selects the largest density parameter point as the cluster center, sets the density parameter of the point whose distance from this cluster center is less than the average distance to 0. Then loops until k initial cluster centers are found. In the clustering stage, the normalized distance is used to replace the Euclidean distance to calculate to determine the cluster to which each point belongs, and the median is used to replace the mean to calculate the new cluster center. Finally, compared with K-means, K-mediods, IK-DM and KICIC, K-DPND has good clustering results in most cases.

Distance matrix

k-medians clustering

Complete-linkage clustering

Center (category theory)

Mahalanobis distance

Distance measures

10.1109/icbda51983.2021.9403172

Cite

Citations (3)

An improved k-means clustering algorithm

World Automation Congress (2012)

Yintong Wang Wanlong Li Rujia Gao

K-means algorithm used in the initial cluster centers are randomly generated, then clustering results are unstable and susceptible to the noise data-objects. In this paper presents a density-based algorithm to determine the initial cluster centers, eliminate the clustering results depend on the initial cluster centers. While, optimized the methods of cluster centers re-calculation and the distance from data-object to the cluster center, reduce noise impact on the clustering results, which meets the clustering of asymmetry density cluster. Experiments on UCI datasets show that the improved algorithm can eliminate the clustering results depend on the initial cluster centers, obtain more compact cluster. Therefore, the improved K-means clustering algorithm is effective.

k-medians clustering

Single-linkage clustering

Source

Cite

Citations (38)

Improving the Efficiency of Image Clustering using Modified Non Euclidean Distance Measures in Data Mining

International Journal of Computers Communications & Control (2014)

P. Santhi V. Murali Bhaskaran

The Image is very important for the real world to transfer the messages from any source to destination. So, these images are converted in to useful information using data mining techniques. In existing all the research papers using kmeans and fuzzy k means with euclidean distance for image clustering. Here, each cluster needs its own centric for cluster calculation and the euclidean distance calculate the distance between the points. In clustering this process of distance calculation did not give efficient result. For make this in to efficient, this research paper proposes the non Euclidean distance measures for distance calculation. Here, the logical points are used to find the cluster. The result shows that image clustering based on the modified non Euclidean distance and the performance shows the efficiency of non euclidean distancemeasures.

k-medians clustering

K-Means Clustering

Distance measures

Minkowski distance

10.15837/ijccc.2014.1.50

Cite

Citations (7)

Journal of Central South University (2011)

Jufeng Zhou XuanQi Fang Taiming Zhang Zhe Zhao Rong Zhu

Similarity (geometry)

Cosine similarity

Direction cosine

Feature (linguistics)

10.1007/s11771-011-0702-x

Cite

Citations (17)

Efficient Inter-Centroid Distance Estimation for the Data Clustering Using RCKM Algorithm

Mini Jain Chetan Gupta

In this paper an efficient random centroid k-means (RCKM) method is used to provide better centroid distance selection in the clustering. For the experimentation and comparative analysis Pima Indians diabetes database has been used. In this approach the distance algorithm considered here are Euclidean, Pearson Coefficient, Chebyshev and Canberra. First we have randomized the centroid value in each iteration and process the value for clustering in the minimum centroid distance weight and maximum centroid distance weight for finding the clustering in several iterations. Based on the above distance algorithms centroid distances have been calculated to find out the minimum and maximum centroid distance for the clustering process. Here we also calculate the time taken to process the whole dataset. The Experimental result shows that the RCKM approach is better than previous work in both minimum and maximum spans for finding centroid distance in all the cases.

Centroid

k-medians clustering

10.1109/icecct.2019.8869232

Cite

Citations (0)

Bigdata Clustering using X-means method with Euclidean Distance

Journal of Physics Conference Series (2020)

Niskarto Zendrato Hanna Willa Dhany Novriadi Antonius Siagian Fahmi Izhari

Abstract Centroid is the central point of data in the grouping process, it is necessary to analyze the centroid in determining the initial value in the initial clustering process. So it is used as a cluster center point in the X-Means algorithm clustering process. Determine cluster center points or centroid, measure the performance of the X-Means algorithm with range cluster parameters by measuring distances between centroid for a fast and efficient way to group unstructured data, and to speed up the model construction process and divide several centroid in half to match the data as a test tool for the analysis of the X-Means method. From testing using the X-Means algorithm with the determination of the number of Centroid clusters carried out by modifying the X-Means method to do some determination of the centroid to get the results of 11 iterations. From the results of these tests produce good cluster members the level of similarity of data with other data and in determining the number of clusters, using the modification of the Euclidean distance method, get better results of the similarity level of each member compared to randomly determining the number of clusters with several iterations.

Centroid

Similarity (geometry)

k-medians clustering

Complete-linkage clustering

Data point

10.1088/1742-6596/1566/1/012103

Cite

Citations (9)