Model Selection and Model Complexity: Identifying Truth Within A Space Saturated with Random Models

2005 
A framework for the analysis of model selection issues is presented. The framework separates model selection into two dimensions: the model-complexity dimension and the model-space dimension. The model-complexity dimension pertains to how the complexity of a single model interacts with its scoring by standard evaluation measures. The model-space dimension pertains to the interpretation of the totality of evaluation scores obtained. Central to the analysis is the concept of evaluation coherence, a property which requires that a measure not produce misleading model evaluations. Of particular interest is whether model evaluation measures are misled by model complexity. Several common evaluation measures — apparent error rate, the BD metric, and MDL scoring — are analyzed, and each is found to lack complexity coherence. These results are used to consider arguments for and against the Occam razor paradigm as it pertains to overfit avoidance in model selection, and also to provide an abstract analysis of what the literature refers to as oversearch. The machine learning and statistics literature contains much analysis of how such factors as the complexity of models, the number of models evaluated, and the distributions of true models and relevant features affect model selection and error bound estimation. In this article, we propose that the questions are clarified when one makes explicit a separation of model selection factors into two dimensions: the model-complexity dimension and the model-space dimension. Intuitively, the model-complexity dimension pertains to how the complexity of a single model affects the distribution of its evaluation scores, while the model-space dimension pertains to how the characteristics of model space affect the interpretation of the totality of evaluation scores. We postulate a pristine, limiting case set of assumptions which reflects an idealization of many high-dimensional applications (e.g., microarray, proteomic, and many other biomedically-inspired analyses) currently the subject of intense investigations. In such an environment, the number of features is virtually limitless, and most have no correlation with the class to be predicted. Our idealization facilitates the study of central issues, and the results are argued to provide insight into more realistic settings in which the assumptions are relaxed. We develop a notion of measure coherence. Coherence means, roughly, that the model evaluation measure behaves in a rational way when used to compare models. Of particular interest is the question of whether measures exhibit an a priori bias for or against models of high complexity as compared to simple models. We study the question in the abstract, as well as by applying the analysis to standard data likelihood, to the apparent error rate evaluation measure (both with and without cross validation), to the Bayesian-Dirichlet (BD) metric, and to the minimum description length (MDL) scoring function. We present both analytical and numerical results demonstrating lack of coherence for the error rate measure (with a bias toward more complex models), and for MDL and the BD metric (with a bias toward less complex models). We interpret these results in the context of such previous research as that presented in [1, 2, 13, 16, 17, 20, 23, 26, 30]. Our analysis is enabled by the separation of the model-complexity dimension from the modelspace dimension: issues that often have been attributed to model space or to model search are now seen to be directly rooted in the non-coherence of the measure.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    36
    References
    0
    Citations
    NaN
    KQI
    []