Assessing temporal data partitioning scenarios for estimating reference evapotranspiration with machine learning techniques in arid regions

2020 
Abstract Recently, data driven machine learning techniques has been widely applied for modeling reference evapotranspiration (ETo) values under various climatic conditions taking into account the different number of sites and available data length. A major issue with applying those models is the proper selection of training/testing data sets. Although some spatial generalization approaches have been recommended for this purpose, there are no specified recommended local (temporal) data partitioning strategies for machine learning based ETo estimation. The present study evaluates different hold-out and k-fold validation temporal data partitioning strategies when using gene expression programming (GEP) technique to estimate daily ETo in arid regions. The k-fold validation strategies considered annual, monthly and growing season period patterns as test data sets. Although commonly used partitioning of the available patterns into training and testing sets gave accurate results, statistical analysis showed that the results obtained through k-fold validation assessment were more reliable. A two-block partitioning strategy with chronologic data selection for training and testing provided the most accurate results among the hold-out procedures (mean scatter index (SI) value of 0.162). Fixing the extreme ETo values as training data set in hold-out procedures provided the less accurate results with considerable over/underestimation of the ETo values (mean SI value was 0.506). Results on the basis of hold-out approaches can be biased or only partially valid depending on selection of the test data from the time series. K-fold validation yielded the lowest over/underestimations of ETo values. Further, considering monthly patterns as minimum affordable test size produced higher error magnitudes among k-fold validation strategies, while considering the complete patterns of one growing season provided more accurate results among k-fold validation strategies.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    44
    References
    11
    Citations
    NaN
    KQI
    []