Multicollinearity

In statistics, multicollinearity (also collinearity) is a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. In this situation the coefficient estimates of the multiple regression may change erratically in response to small changes in the model or the data. Multicollinearity does not reduce the predictive power or reliability of the model as a whole, at least within the sample data set; it only affects calculations regarding individual predictors. That is, a multivariate regression model with collinear predictors can indicate how well the entire bundle of predictors predicts the outcome variable, but it may not give valid results about any individual predictor, or about which predictors are redundant with respect to others. In statistics, multicollinearity (also collinearity) is a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. In this situation the coefficient estimates of the multiple regression may change erratically in response to small changes in the model or the data. Multicollinearity does not reduce the predictive power or reliability of the model as a whole, at least within the sample data set; it only affects calculations regarding individual predictors. That is, a multivariate regression model with collinear predictors can indicate how well the entire bundle of predictors predicts the outcome variable, but it may not give valid results about any individual predictor, or about which predictors are redundant with respect to others. In the case of perfect multicollinearity (in which one independent variable is an exact linear combination of the others) the design matrix X {displaystyle X} has less than full rank, and therefore the moment matrix X T X {displaystyle X^{mathsf {T}}X} cannot be inverted. Under these circumstances, for a general linear model y = X β + ϵ {displaystyle y=Xeta +epsilon } , the ordinary least-squares estimator β ^ O L S = ( X T X ) − 1 X T y {displaystyle {hat {eta }}_{OLS}=(X^{mathsf {T}}X)^{-1}X^{mathsf {T}}y} does not exist. Note that in statements of the assumptions underlying regression analyses such as ordinary least squares, the phrase 'no multicollinearity' is sometimes used to mean the absence of perfect multicollinearity, which is an exact (non-stochastic) linear relation among the regressors. Collinearity is a linear association between two explanatory variables. Two variables are perfectly collinear if there is an exact linear relationship between them. For example, X 1 {displaystyle X_{1}} and X 2 {displaystyle X_{2}} are perfectly collinear if there exist parameters λ 0 {displaystyle lambda _{0}} and λ 1 {displaystyle lambda _{1}} such that, for all observations i, we have Multicollinearity refers to a situation in which two or more explanatory variables in a multiple regression model are highly linearly related. We have perfect multicollinearity if, for example as in the equation above, the correlation between two independent variables is equal to 1 or −1. In practice, we rarely face perfect multicollinearity in a data set. More commonly, the issue of multicollinearity arises when there is an approximate linear relationship among two or more independent variables.

Parent Topic

Child Topic

No Parent Topic