A multiple clusterings model based on Gaussian mixture

Andrea Pastore,Stefano Federico Tonellato

A multiple clusterings model based on Gaussian mixture

2015

We consider, under a Gaussian model based perspective, the problem of identifying different clusterings of a given set of units, where each clustering is described by a subset of the observed variables. In particular, we assume that it is possible to identify, among the set of observed variables, $T$ subsets, each of them giving rise to a different clusterings, and a coplementary subset of (unrelevant) variables that provides no information about clustering. We assume that the distributions of the $T$ clustering subsets are independent Gaussian mixtures, while the conditional distribution of the unrelevant variable, given those included in the $T$ subsets, is Gaussian. For the identification of the $T$ subsets of variables and for estimation of the model parameters, we propose a generalization of an algorithm by Raftery and Dean, which is tailored for the selection of variables in Gaussian mixture models, and where the model comparison problem is addressed using approximate Bayes factors. The proposed algorithm provide a forward-stepwise identification of the $T$ clustering subsets of variables. Some results from Monte Carlo expreriments and from application to real dataset are presented.

Keywords:

Correction
Cite
Save
Machine Reading By IdeaReader

References

Citations