Sparse linear prediction methods suffer from decreased prediction accuracy when the predictor variables have cluster structure (e.g. there are highly correlated groups of variables). To improve prediction accuracy, various methods have been proposed to identify variable clusters from the data and integrate cluster information into a sparse modeling process. But none of these methods achieve satisfactory performance for prediction, variable selection and variable clustering simultaneously. This paper presents Variable Cluster Principal Component Regression (VC-PCR), a prediction method that supervises variable selection and variable clustering in order to solve this problem. Experiments with real and simulated data demonstrate that, compared to competitor methods, VC-PCR achieves better prediction, variable selection and clustering performance when cluster structure is present.
Optimizing the quality of a product is widespread in the industry. Products have to be manufactured such that they best fit some quality properties. Varying the product settings leads to different product qualities and the aim of the manufacturer is to find the factors settings that simultaneously optimize the quality properties. The classical approach to solve such optimization problem is based on response surface methodology. First, a designed experiment is used to collect data and to adjust models capturing the relationship between the responses of interest and the factors settings. Those fitted models can then predict the quality properties for any design point of the experimental domain. Secondly, a desirability index is built to combine the predicted properties into a value belonging to the [0; 1] interval. This index provides a ranking of possible factors settings in the solutions space and the optimum can be found by an adequate optimization algorithm. But, as model predictions are suiled with error, so is the desirability index and the optimal solution found. In practice, in the related literature and design of experiment software, this error is neglected. This paper proposes an optimization methodology based on the fact that a desirability index is a random variable. The expectation of this index is taken as the criteria to be optimized and, since it can only be estimated, condence and predicted intervals are constructed to take into account the propagation of the models error on the expected or predicted desirability index. The stochastic character of the index leads also to an uncertainty on the optimum and a methodology is proposed to build an equivalence zone containing no significantly different optimal solutions. This methodology is illustrated on a simulated example and compared to the classical optimization methodology.