Ascertainment of the number of samples in the validation set in Monte Carlo cross validation and the selection of model dimension with Monte Carlo cross validation

2006 
Monte Carlo cross validation (MCCV) is used in two data sets including 125 and 1643 near-infrared (NIR) spectra of biological samples, respectively, to ascertain the number of samples left out for validation in MCCV and the dimension of PLS models consequently. With the selected number of samples in validation set, the suitable number of latent variables (LV) may be chosen correctly. The results obtained show that root mean squared error of calibration (RMSEC), root mean squared error of cross validation (RMSECV) and LV number are sensitive to the number of samples left out for validation when too many samples are left out. Based on this, RMSEC and RMSECV are suggested as criteria to assist the ascertainment of the number of samples left out for validation in MCCV. This method is easy and convenient to use. For a larger data set, more samples may be left out, but the suitable number of samples left out will decrease if the measurement error level is high.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    18
    Citations
    NaN
    KQI
    []