Matrix regularization

In the field of statistical learning theory, matrix regularization generalizes notions of vector regularization to cases where the object to be learned is a matrix. The purpose of regularization is to enforce conditions, for example sparsity or smoothness, that can produce stable predictive functions. For example, in the more common vector framework, Tikhonov regularization optimizes over In the field of statistical learning theory, matrix regularization generalizes notions of vector regularization to cases where the object to be learned is a matrix. The purpose of regularization is to enforce conditions, for example sparsity or smoothness, that can produce stable predictive functions. For example, in the more common vector framework, Tikhonov regularization optimizes over to find a vector x {displaystyle x} that is a stable solution to the regression problem. When the system is described by a matrix rather than a vector, this problem can be written as where the vector norm enforcing a regularization penalty on x {displaystyle x} has been extended to a matrix norm on X {displaystyle X} . Matrix regularization has applications in matrix completion, multivariate regression, and multi-task learning. Ideas of feature and group selection can also be extended to matrices, and these can be generalized to the nonparametric case of multiple kernel learning. Consider a matrix W {displaystyle W} to be learned from a set of examples, S = ( X i t , y i t ) {displaystyle S=(X_{i}^{t},y_{i}^{t})} , where i {displaystyle i} goes from 1 {displaystyle 1} to n {displaystyle n} , and t {displaystyle t} goes from 1 {displaystyle 1} to T {displaystyle T} . Let each input matrix X i {displaystyle X_{i}} be ∈ R D T {displaystyle in mathbb {R} ^{DT}} , and let W {displaystyle W} be of size D × T {displaystyle D imes T} . A general model for the output y {displaystyle y} can be posed as where the inner product is the Frobenius inner product. For different applications the matrices X i {displaystyle X_{i}} will have different forms, but for each of these the optimization problem to infer W {displaystyle W} can be written as where E {displaystyle E} defines the empirical error for a given W {displaystyle W} , and R ( W ) {displaystyle R(W)} is a matrix regularization penalty. The function R ( W ) {displaystyle R(W)} is typically chosen to be convex and is often selected to enforce sparsity (using ℓ 1 {displaystyle ell ^{1}} -norms) and/or smoothness (using ℓ 2 {displaystyle ell ^{2}} -norms). Finally, W {displaystyle W} is in the space of matrices H {displaystyle {mathcal {H}}} with Frobenius inner product ⟨ … ⟩ F {displaystyle langle dots angle _{F}} . In the problem of matrix completion, the matrix X i t {displaystyle X_{i}^{t}} takes the form where ( e t ) t {displaystyle (e_{t})_{t}} and ( e i ′ ) i {displaystyle (e_{i}')_{i}} are the canonical basis in R T {displaystyle mathbb {R} ^{T}} and R D {displaystyle mathbb {R} ^{D}} . In this case the role of the Frobenius inner product is to select individual elements w i t {displaystyle w_{i}^{t}} from the matrix W {displaystyle W} . Thus, the output y {displaystyle y} is a sampling of entries from the matrix W {displaystyle W} .

Parent Topic

Child Topic

No Parent Topic