Core-Sets For Canonical Correlation Analysis

Saurabh Paul

Core-Sets For Canonical Correlation Analysis

2015

Saurabh Paul

Canonical Correlation Analysis (CCA) is a technique that finds how "similar" are the subspaces that are spanned by the columns of two different matrices A έℜ(of size m-x-n ) and B έℜ(of size m-x-l ). CCA measures similarity by means of the cosines of the so-called principal angles between the two subspaces. Those values are also known as canonical correlations of the matrix pair ( A,B ). In this work, we consider the over-constrained case where the number of rows is greater than the number of columns ( m > max( n,l )). We study the problem of constructing "core-sets" for CCA. A core-set is a subset of rows from A and the corresponding subset of rows from B - denoted by Â and B , respectively. A "good" core-set is a subset of rows such that the canonical correlations of the core-set ( Â , B ) are "close" to the canonical correlations of the original matrix pair ( A, B ). There is a natural tradeoff between the core-set size and the approximation accuracy of a core-set. We present two algorithms namely, single-set spectral sparsification and leverage-score sampling, which find core-sets with additive-error guarantees to canonical correlations.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations