Core-Sets For Canonical Correlation Analysis

2015 
Canonical Correlation Analysis (CCA) is a technique that finds how "similar" are the subspaces that are spanned by the columns of two different matrices A έℜ(of size m-x-n ) and B έℜ(of size m-x-l ). CCA measures similarity by means of the cosines of the so-called principal angles between the two subspaces. Those values are also known as canonical correlations of the matrix pair ( A,B ). In this work, we consider the over-constrained case where the number of rows is greater than the number of columns ( m > max( n,l )). We study the problem of constructing "core-sets" for CCA. A core-set is a subset of rows from A and the corresponding subset of rows from B - denoted by  and B , respectively. A "good" core-set is a subset of rows such that the canonical correlations of the core-set (  , B ) are "close" to the canonical correlations of the original matrix pair ( A, B ). There is a natural tradeoff between the core-set size and the approximation accuracy of a core-set. We present two algorithms namely, single-set spectral sparsification and leverage-score sampling, which find core-sets with additive-error guarantees to canonical correlations.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    8
    Citations
    NaN
    KQI
    []