Dimension reduction for data of unknown cluster structure

Ewa Nowakowska,Jacek Koronacki,Stan Lipovetsky

Dimension reduction for data of unknown cluster structure

2014

For numerous reasons there raises a need for dimension reduction that preserves certain characteristics of data. In this work we focus on data coming from a mixture of Gaussian distributions and we propose a method that preserves distinctness of clustering structure, although the structure is assumed to be yet unknown. The rationale behind the method is the following: (i) had one known the clusters (classes) within the data, one could facilitate further analysis and reduce space dimension by projecting the data to the Fisher's linear subspace, which -- by definition -- preserves the structure of the given classes best (ii) under some reasonable assumptions, this can be done, albeit approximately, without the prior knowledge of the clusters (classes). In the paper, we show how this approach works. We present a method of preliminary data transformation that brings the directions of largest overall variability close to the directions of the best between-class separation. Hence, for the transformed data, simple PCA provides an approximation to the Fisher's subspace. We show that the transformation preserves distinctness of unknown structure in the data to a great extent.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations