A Unified Formulation of k-Means, Fuzzy c-Means and Gaussian Mixture Model by the Kolmogorov-Nagumo Average.

Osamu Komori,Shinto Eguchi

A Unified Formulation of k-Means, Fuzzy c-Means and Gaussian Mixture Model by the Kolmogorov-Nagumo Average.

2021

Clustering is a major unsupervised learning algorithm and is widely applied in data mining and statistical data analyses. Typical examples include k-means, fuzzy c-means, and Gaussian mixture models, which are categorized into hard, soft, and model-based clusterings, respectively. We propose a new clustering, called Pareto clustering, based on the Kolmogorov–Nagumo average, which is defined by a survival function of the Pareto distribution. The proposed algorithm incorporates all the aforementioned clusterings plus maximum-entropy clustering. We introduce a probabilistic framework for the proposed method, in which the underlying distribution to give consistency is discussed. We build the minorize-maximization algorithm to estimate the parameters in Pareto clustering. We compare the performance with existing methods in simulation studies and in benchmark dataset analyses to demonstrate its highly practical utilities.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations