Generalized least squares

In statistics, generalized least squares (GLS) is a technique for estimating the unknown parameters in a linear regression model when there is a certain degree of correlation between the residuals in a regression model. In these cases, ordinary least squares and weighted least squares can be statistically inefficient, or even give misleading inferences. GLS was first described by Alexander Aitken in 1934. In statistics, generalized least squares (GLS) is a technique for estimating the unknown parameters in a linear regression model when there is a certain degree of correlation between the residuals in a regression model. In these cases, ordinary least squares and weighted least squares can be statistically inefficient, or even give misleading inferences. GLS was first described by Alexander Aitken in 1934. In standard linear regression models we observe data { y i , x i j } i = 1 , … , n , j = 2 , … , k {displaystyle {y_{i},x_{ij}}_{i=1,dots ,n,j=2,dots ,k}} on n statistical units. The response values are placed in a vector y = ( y 1 , … , y n ) T {displaystyle mathbf {y} =left(y_{1},dots ,y_{n} ight)^{mathtt {T}}} , and the predictor values are placed in the design matrix X = ( x 1 T , … , x n T ) T {displaystyle mathbf {X} =left(mathbf {x} _{1}^{mathtt {T}},dots ,mathbf {x} _{n}^{mathtt {T}} ight)^{mathtt {T}}} , where x i = ( 1 , x 2 i , … , x k i ) {displaystyle mathbf {x} _{i}=left(1,x_{2i},dots ,x_{ki} ight)} is a vector of the k predictor variables (including a constant) for the ith unit. The model forces the conditional mean of y {displaystyle mathbf {y} } given X {displaystyle mathbf {X} } to be a linear function of X {displaystyle mathbf {X} } , and assumes the conditional variance of the error term given X {displaystyle mathbf {X} } is a known nonsingular covariance matrix Ω {displaystyle mathbf {Omega } } . This is usually written as Here β ∈ R k {displaystyle eta in mathbb {R} ^{k}} is a vector of unknown constants (known as “regression coefficients”) that must be estimated from the data. Suppose b {displaystyle mathbf {b} } is a candidate estimate for β {displaystyle mathbf {eta } } . Then the residual vector for b {displaystyle mathbf {b} } will be y − X b {displaystyle mathbf {y} -mathbf {X} mathbf {b} } . The generalized least squares method estimates β {displaystyle mathbf {eta } } by minimizing the squared Mahalanobis length of this residual vector: Since the objective is a quadratic form in b {displaystyle mathbf {b} } , the estimator has an explicit formula: The GLS estimator is unbiased, consistent, efficient, and asymptotically normal with E ⁡ [ β ^ ∣ X ] = β {displaystyle operatorname {E} =eta } and Cov ⁡ [ β ^ ∣ X ] = ( X T Ω − 1 X ) − 1 {displaystyle operatorname {Cov} =(mathbf {X} ^{mathtt {T}}Omega ^{-1}mathbf {X} )^{-1}} . GLS is equivalent to applying ordinary least squares to a linearly transformed version of the data. To see this, factor Ω = C C T {displaystyle mathbf {Omega } =mathbf {C} mathbf {C} ^{mathtt {T}}} , for instance using the Cholesky decomposition. Then if we pre-multiply both sides of the equation y = X β + ε {displaystyle mathbf {y} =mathbf {X} mathbf {eta } +mathbf {varepsilon } } by C − 1 {displaystyle mathbf {C} ^{-1}} , we get an equivalent linear model y ∗ = X ∗ β + ε ∗ {displaystyle mathbf {y} ^{*}=mathbf {X} ^{*}mathbf {eta } +mathbf {varepsilon } ^{*}} where y ∗ = C − 1 y {displaystyle mathbf {y} ^{*}=mathbf {C} ^{-1}mathbf {y} } , X ∗ = C − 1 X {displaystyle mathbf {X} ^{*}=mathbf {C} ^{-1}mathbf {X} } , and ε ∗ = C − 1 ε {displaystyle mathbf {varepsilon } ^{*}=mathbf {C} ^{-1}mathbf {varepsilon } } . In this model Var ⁡ [ ε ∗ ∣ X ] = C − 1 Ω ( C − 1 ) T = I {displaystyle operatorname {Var} =mathbf {C} ^{-1}mathbf {Omega } left(mathbf {C} ^{-1} ight)^{mathtt {T}}=mathbf {I} } , where I {displaystyle mathbf {I} } is the identity matrix. Thus we can efficiently estimate β {displaystyle mathbf {eta } } by applying OLS to the transformed data, which requires minimizing This has the effect of standardizing the scale of the errors and “de-correlating” them. Since OLS is applied to data with homoscedastic errors, the Gauss–Markov theorem applies, and therefore the GLS estimate is the best linear unbiased estimator for β. A special case of GLS called weighted least squares (WLS) occurs when all the off-diagonal entries of Ω are 0. This situation arises when the variances of the observed values are unequal (i.e. heteroscedasticity is present), but where no correlations exist among the observed variances. The weight for unit i is proportional to the reciprocal of the variance of the response for unit i. If the covariance of the errors Ω {displaystyle Omega } is unknown, one can get a consistent estimate of Ω {displaystyle Omega } , say Ω ^ {displaystyle {widehat {Omega }}} , using an implementable version of GLS known as the feasible generalized least squares (FGLS) estimator. In FGLS, modeling proceeds in two stages: (1) the model is estimated by OLS or another consistent (but inefficient) estimator, and the residuals are used to build a consistent estimator of the errors covariance matrix (to do so, one often needs to examine the model adding additional constraints, for example if the errors follow a time series process, a statistician generally needs some theoretical assumptions on this process to ensure that a consistent estimator is available); and (2) using the consistent estimator of the covariance matrix of the errors, one can implement GLS ideas.

Parent Topic

Child Topic

No Parent Topic