The Evolution of Linear Models in SAS: A Personal Perspective

Ramon C. Littell

The Evolution of Linear Models in SAS: A Personal Perspective

2011

Ramon C. Littell

Phenomenal growth in computational power from 1970 through 2010 enabled a parallel expansion in linear model methodology. From humble beginnings in agriculture, linear model applications are now essential in sciences of genetics, education, and biostatistics, to name a few. Indeed, the meaning of "linear models" has evolved accordingly. Developers at SAS Institute have been in the forefront of invention and implementation of these methods at the core of statistical science. Pathways will be traced in steps of SAS ® procedures, beginning with GLM and REG, proceeding through VARCOMP, NLIN, MIXED and GENMOD, and arriving at NLMIXED and GLIMMIX. Along the way, some problems have disappeared, new ones have emerged, and others are still along for the ride. ® from the perspective of an outsider who has closely followed the progression and whose professional career was influenced by it. Linear models have been in the core of statistical methodology and SAS procedures followed that pattern. The year 1976 can be considered the birth date of SAS as we now recognize it. SAS·76 was the first release of SAS Incorporated. So one may think of time since 1976 as the Common Era of SAS. The hallmark statistical procedure in SAS·76 was GLM. It was highly innovative for its time and caught attention of statisticians and others engaged in data analysis across the US and beyond. GLM established a pattern for statistical procedures in SAS. Instead of a large number of special purpose linear model applications, GLM provided a comprehensive platform that enabled a user to obtain solutions for most problems falling in the arena of linear models; for regression analysis, analysis of variance and covariance, and multivariate analysis. Whereas most of the capabilities of GLM were inspired by statisticians working in agriculture research, GLM became the workhorse procedure for pharmaceutical statisticians and biostatisticians. A few years later the REG procedure was released. It expanded regression capabilities to include diagnostic techniques that had been the subject of active research, and recently published in a major text book by Belsley, Kuh and Welsch (1980). Now the user not only had the capability to compute inferential statistics in regression analysis, but could also obtain statistics to help decide what variables to include in the analysis and to identify problematic data. The VARCOMP procedure provided estimates of variance components in mixed linear models, giving the user four choices of methods of estimation that have also been incorporated into later SAS procedures. This procedure, like GLM, brought forth computing machinery that opened the door to evaluation and comparison of statistical methods which were previously infeasible. The NLIN procedure, although not really intended for linear models, permitted the formulation of models with linear components, such as segmented polynomials, as nonlinear models. Capabilities for analysis of categorical data were limited in early versions of SAS. They were enhanced by the CATMOD and GENMOD procedures. CATMOD was based on methodology of Grizzle, Starmer and Koch (1969) that innovated using linear models for categorical data analysis. A later procedure GENMOD was based on generalized linear models introduced by Nelder and Wedderburn (1972). During the 1980's GLM added useful enhancements, but was nagged by the need for features to adequately accommodate problems related to analysis of correlated data. The immensity of this need inspired the development of the MIXED procedure. Now data with random effects and repeated measures could be analyzed by incorporating those features into the statistical model for the data. Whereas GLM was built around the model for the expected value of the response variable taking all independent variables as fixed, MIXED is built around models for both the expected value of the response as a function only of the fixed variables, and the variance of random effects. This turned the tables in the relation between statistical methodology and its computational implementation. MIXED revealed the need for further development of methods to adjust for the effects of using variance estimates in place of true variances Shortly following MIXED, macros were provided for fitting nonlinear mixed models and generalized linear mixed models using MIXED to make iterative computations. These macros later evolved into the procedures NLMIXED and GLIMMIX. The GLIMMIX procedure extends the capabilities of GLM and MIXED to generalized linear models.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations