Cross-validation pitfalls when selecting and assessing regression and classification models

Damjan Krstajic,Ljubomir Buturovic,David E. Leahy,Simon Thomas

Cross-validation pitfalls when selecting and assessing regression and classification models

2014

Damjan Krstajic
Ljubomir Buturovic
David E. Leahy
Simon Thomas

Background We address the problem of selecting and assessing classification and regression models using cross-validation. Current state-of-the-art methods can yield models with high variance, rendering them unsuitable for a number of practical applications including QSAR. In this paper we describe and evaluate best practices which improve reliability and increase confidence in selected models. A key operational component of the proposed methods is cloud computing which enables routine use of previously infeasible approaches.

Keywords:

Regression analysis
Data mining
Cross-validation
Cloud computing
Bioinformatics
Rendering (computer graphics)
Computer science
Quantitative structure–activity relationship
Regression
Feature selection
Mean squared prediction error

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

348

Citations