Improving the spatial prediction of soil organic carbon using environmental covariates selection: A comparison of a group of environmental covariates

2022 
Abstract In the digital soil mapping (DSM) framework, machine learning models quantify the relationship between soil observations and environmental covariates. Generally, the most commonly used covariates (MCC; e.g., topographic attributes and single-time remote sensing data, and legacy maps) were employed in DSM studies. Additionally, remote sensing time-series (RST) data can provide useful information for soil mapping. Therefore, the main aims of the study are to compare the MCC, the monthly Sentinel-2 time-series of vegetation indices dataset, and the combination of datasets (MCC + RST) for soil organic carbon (SOC) prediction in an arid agroecosystem in Iran. We used different machine learning algorithms, including random forest (RF), Cubist, support vector machine (SVM), and partial least square regression (PLSR). A total of 237 soil samples at 0–20 cm depths were collected. The 5-fold cross-validation technique was used to evaluate the modeling performance, and 50 bootstrap models were applied to quantify the prediction uncertainty. The results showed that the Cubist model performed the best with the MCC dataset (R2 = 0.35, RMSE = 0.26%) and the combined dataset of MCC and RST (R2 = 0.33, RMSE = 0.27%), while the RF model showed better results for the RST dataset (R2 = 0.10, RMSE = 0.31%). Soil properties could explain the SOC variation in MCC and combined datasets (66.35% and 50.82%, respectively), while NDVI was the most controlling factor in the RST (50.22%). Accordingly, results showed that time-series vegetation indices did not have enough potential to increase SOC prediction accuracy. However, the combination of MCC and RST datasets produced SOC spatial maps with lower uncertainty. Therefore, future studies are required to explicitly explain the efficiency of time-series remotely-sensed data and their interrelationship with environmental covariates to predict SOC in arid regions with low SOC content.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    87
    References
    2
    Citations
    NaN
    KQI
    []