Cluster analysis of curved-shaped data with species-sampling mixture models

2013 
An interesting problem, not often faced under the Bayesian nonparametric framework, is the clustering of data whose support is ''curved'' (in an Euclidean space). Recently, we have addressed this problem, introducing a model which combines two ingredients, species sampling mixtures of parametric densities on one hand, and a deterministic clustering procedure (DBSCAN) on the other. The model is called b-DBSCAN. In short, under this model two observations share the same cluster if the distance between the densities corresponding to their latent parameters is smaller than a threshold. However, the prior for the random partition under the b-DBSCAN model, i.e. prior cluster assignments, is based on the geometry of the space of kernel densities rather than a direct random partition prior elicitation. Bayesian statisticians would prefer the latter alternative. One could achieve this by using dependent Dirichlet processes (for instance, nested Dirichlet processes - nDP) in the mixture framework, where the prior cluster assignments are given on the latent random mixing measures. On the other hand, according to a different approach, we will consider a new model under which the data are distributed as a mixture of kernel densities centered around curves, describing the ''shape'' of the data on the sample space. Such kernels are built following the theory of principal curves introduced by Hastie and Stuetzle (1989). In this work, the three mixture models (b-DBSCAN, nDP-mixtures, and the new model) will be compared in order to better understand the ''pros and cons'' of each one in this clustering context.
    • Correction
    • Cite
    • Save
    • Machine Reading By IdeaReader
    5
    References
    0
    Citations
    NaN
    KQI
    []