ROS Regression: Integrating Regularization with Optimal Scaling Regression
2019
We present a methodology for multiple regression analysis that
deals with categorical variables (possibly mixed with continuous ones), in
combination with regularization, variable selection and high-dimensional
data (P N). Regularization and optimal scaling (OS) are two important
extensions of ordinary least squares regression (OLS) that will be combined
in this paper. There are two data analytic situations for which optimal scaling
was developed. One is the analysis of categorical data, and the other
the need for transformations because of nonlinear relationships between predictors
and outcome. Optimal scaling of categorical data finds quantifications
for the categories, both for the predictors and for the outcome variables,
that are optimal for the regression model in the sense that they maximize
the multiple correlation. When nonlinear relationships exist, nonlinear
transformation of predictors and outcome maximize the multiple correlation
in the same way. We will consider a variety of transformation types; typically
we use step functions for categorical variables, and smooth (spline)
functions for continuous variables. Both types of functions can be restricted
to be monotonic, preserving the ordinal information in the data. In combination
with optimal scaling, three popular regularization methods will be
considered: Ridge regression, the Lasso and the Elastic Net. The resulting
method will be called ROS Regression (Regularized Optimal Scaling Regression).
The OS algorithm provides straightforward and efficient estimation
of the regularized regression coefficients, automatically gives the Group
Lasso and Blockwise Sparse Regression, and extends them by the possibility
to maintain ordinal properties in the data. Extended examples are provided.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
54
References
3
Citations
NaN
KQI