language-icon Old Web
English
Sign In

Matrix calculus

In mathematics, matrix calculus is a specialized notation for doing multivariable calculus, especially over spaces of matrices. It collects the various partial derivatives of a single function with respect to many variables, and/or of a multivariate function with respect to a single variable, into vectors and matrices that can be treated as single entities. This greatly simplifies operations such as finding the maximum or minimum of a multivariate function and solving systems of differential equations. The notation used here is commonly used in statistics and engineering, while the tensor index notation is preferred in physics. ∂ u ∂ x , ∂ v ∂ x {displaystyle {frac {partial mathbf {u} }{partial mathbf {x} }},{frac {partial mathbf {v} }{partial mathbf {x} }}} in numerator layout ∂ u ∂ x , ∂ v ∂ x {displaystyle {frac {partial mathbf {u} }{partial mathbf {x} }},{frac {partial mathbf {v} }{partial mathbf {x} }}} in denominator layout ∂ u ∂ x , ∂ v ∂ x {displaystyle {frac {partial mathbf {u} }{partial mathbf {x} }},{frac {partial mathbf {v} }{partial mathbf {x} }}} in numerator layout ∂ u ∂ x , ∂ v ∂ x {displaystyle {frac {partial mathbf {u} }{partial mathbf {x} }},{frac {partial mathbf {v} }{partial mathbf {x} }}} in denominator layout ∂ u ∂ x {displaystyle {frac {partial mathbf {u} }{partial mathbf {x} }}} in numerator layout ∂ u ∂ x {displaystyle {frac {partial mathbf {u} }{partial mathbf {x} }}} in denominator layouti.e. mixed layout if denominator layout for X is being used. In mathematics, matrix calculus is a specialized notation for doing multivariable calculus, especially over spaces of matrices. It collects the various partial derivatives of a single function with respect to many variables, and/or of a multivariate function with respect to a single variable, into vectors and matrices that can be treated as single entities. This greatly simplifies operations such as finding the maximum or minimum of a multivariate function and solving systems of differential equations. The notation used here is commonly used in statistics and engineering, while the tensor index notation is preferred in physics. Two competing notational conventions split the field of matrix calculus into two separate groups. The two groups can be distinguished by whether they write the derivative of a scalar with respect to a vector as a column vector or a row vector. Both of these conventions are possible even when the common assumption is made that vectors should be treated as column vectors when combined with matrices (rather than row vectors). A single convention can be somewhat standard throughout a single field that commonly uses matrix calculus (e.g. econometrics, statistics, estimation theory and machine learning). However, even within a given field different authors can be found using competing conventions. Authors of both groups often write as though their specific convention were standard. Serious mistakes can result when combining results from different authors without carefully verifying that compatible notations have been used. Definitions of these two conventions and comparisons between them are collected in the layout conventions section. Matrix calculus refers to a number of different notations that use matrices and vectors to collect the derivative of each component of the dependent variable with respect to each component of the independent variable. In general, the independent variable can be a scalar, a vector, or a matrix while the dependent variable can be any of these as well. Each different situation will lead to a different set of rules, or a separate calculus, using the broader sense of the term. Matrix notation serves as a convenient way to collect the many derivatives in an organized way. As a first example, consider the gradient from vector calculus. For a scalar function of three independent variables, f ( x 1 , x 2 , x 3 ) {displaystyle f(x_{1},x_{2},x_{3})} , the gradient is given by the vector equation where x ^ i {displaystyle {hat {x}}_{i}} represents a unit vector in the x i {displaystyle x_{i}} direction for 1 ≤ i ≤ 3 {displaystyle 1leq ileq 3} . This type of generalized derivative can be seen as the derivative of a scalar, f, with respect to a vector, x {displaystyle mathbf {x} } , and its result can be easily collected in vector form. More complicated examples include the derivative of a scalar function with respect to a matrix, known as the gradient matrix, which collects the derivative with respect to each matrix element in the corresponding position in the resulting matrix. In that case the scalar must be a function of each of the independent variables in the matrix. As another example, if we have an n-vector of dependent variables, or functions, of m independent variables we might consider the derivative of the dependent vector with respect to the independent vector. The result could be collected in an m×n matrix consisting of all of the possible derivative combinations. There are, of course, a total of nine possibilities using scalars, vectors, and matrices. Notice that as we consider higher numbers of components in each of the independent and dependent variables we can be left with a very large number of possibilities. The six kinds of derivatives that can be most neatly organized in matrix form are collected in the following table.

[ "Matrix (mathematics)" ]
Parent Topic
Child Topic
    No Parent Topic