SubSpace Projection: A unified framework for a class of partition-based dimension reduction techniques

2009 
Similarity search in high dimensional space is a nontrivial problem due to the so-called curse of dimensionality. Recent techniques such as Piecewise Aggregate Approximation (PAA), Segmented Means (SMEAN) and Mean-Standard deviation (MS) prove to be very effective in reducing data dimensionality by partitioning dimensions into subsets and extracting aggregate values from each dimension subset. These partition-based techniques have many advantages including very efficient multi-phased approximation while being simple to implement. They, however, are not adaptive to the different characteristics of data in diverse applications. We propose SubSpace Projection (SSP) as a unified framework for these partition-based techniques. SSP projects data onto subspaces and computes a fixed number of salient features with respect to a reference vector. A study of the relationships between query selectivity and the corresponding space partitioning schemes uncovers indicators that can be used to predict the performance of the partitioning configuration. Accordingly, we design a greedy algorithm to efficiently determine a good partitioning of the data dimensions. The results of our extensive experiments indicate that the proposed method consistently outperforms state-of-the-art techniques.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    36
    References
    14
    Citations
    NaN
    KQI
    []