Recursive partitioning improves paleosol proxies for rainfall

2019 
The bulk elemental composition of soil subsurface (B) horizons is influenced by environmental, biological, geological, and climatic factors. Because fossil soils (paleosols) are common in the geologic record, quantitative models that link climate to paleosol geochemistry are highly desirable in the paleoclimate community. Error associated with these models is typically reported as the root mean square error (RMSE) of a regression analysis and reflects the variability imparted by non-climatic influences on soil formation and the uncertainty associated with model fitting. However, for prediction purposes, the RMSE is well known to underestimate model uncertainty. In this work we re-evaluate a widely used transfer function for mean annual precipitation (MAP) based on the chemical index of alteration minus potassium (CIA-K) using data science best practices on two continental-scale soil data sets. Data set inter-comparisons and cross-validation of exponential regression models indicate that the root mean square prediction error (RMSPE) between CIA-K and MAP for soils representative of climates across the continental United States is around 299 mm, significantly higher than the currently accepted 182 mm RMSE. Further, CIA-K is unable to predict perhumid (>2000 mm MAP) climate zones. We show that transitioning from a simple regression framework to one of recursive partitioning via random forests can significantly increase prediction accuracy while automating variable selection. We introduce two new, widely applicable random forest models for MAP (RF-MAP) that use 10 elemental oxides as input variables and were calibrated on the Baylor University Soil Informatics (BU-SI) data set. RF-MAP version 1.0 (RF-MAP1.0) was generated using the entire BU-SI data set (n = 685) and can predict MAP values up to 6865 mm with a RMSPE of 395 mm. RF-MAP version 2.0 (RF-MAP2.0) was generated using a modification of the BU-SI data set (n = 642) and can predict MAP values up to ∼1600 mm with a RMSPE of 209 mm. Pruned regression trees provide insight into the mechanisms driving the random forest models and demonstrate the first empirical confirmation of the sensitivity of soil elemental responses to global climate zones. The RF-MAP1.0 and RF-MAP2.0 models predict MAP values comparable to independent proxy estimates for a range of deep-time paleosols. We advocate for application of RF-MAP1.0 in settings where no a priori information on paleoclimate is available, and encourage the use of either RF-MAP1.0 or RF-MAP2.0 if users have independent constraints that paleo-MAP was below 1600 mm.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    77
    References
    4
    Citations
    NaN
    KQI
    []