This submission contains the hydrological data used in the study. Use the following code in python to access the data Load_data = pickle.load( open(filename,'rb')) # filename should be specified along with full file path; pandas version 1.3.5 might be required to open this file
Spatial variability exhibited by many field soils necessitates the use of stochastic methods for prediction of average solute movement. Data from a field experiment were analyzed to characterize the random nature of the velocity and dispersion of solute (potassium bromide) in field scale vertical transport experiments. Solute concentrations were measured at over fifty spatial locations and at six depths within the soil. The analysis indicates that solute velocities at deeper soil layers exhibit a statistically homogeneous behavior. Dispersion was determined from breakthrough curves using a standard nonlinear regression model. These results will be presented, and the implications of modeling average solute behavior will be discussed.
Hydrological models are evaluated by comparisons with observed hydrological quantities such as streamflow. A model evaluation procedure should account for dominantly epistemic errors in hydrological data such as model input precipitation and streamflow and avoid type-2 errors (rejecting a good model). This study uses quantile random forest (QRF) to develop limits-of-acceptability (LoA) over streamflows that account for uncertainties in precipitation and streamflow values. A significant advantage of this method is that it can be used to evaluate models even at ungauged basins. This method was used to evaluate a hydrological model –Sacramento Soil Moisture Accounting (SAC-SMA) – over the St. Joseph River Watershed (SJRW) for both gauged and hypothetical ungauged scenarios. QRF defined wide LoAs that yielded a large number of models as behavioral, suggesting the need for additional measures to develop a more discriminating inference procedure. The paper discusses why the LoAs defined by QRF were wide, along with some ways to define more discriminating LoAs. To further constrain the model, five streamflow-based signatures (i.e., autocorrelation function, Hurst exponent, baseflow index, flow duration curve, and long-term runoff coefficient) were used. The combination of LoAs over streamflow and streamflow-based signatures helped constrain the set of behavioral models in both the gauged and the ungauged scenarios. Among the signatures used in this study, the Hurst exponent and baseflow index were the most useful ones. All the 1-million models evaluated in this study were eventually rejected as unfit-for-purpose.
Principal component analysis (PCA) is the most widely used method for dimensionality reduction, data reconstruction, feature extraction, and data visualization in geosciences. However, in its standard form, PCA makes no distinction between data points for which the associated measurement errors vary in both space and time. Using the backdrop of sea surface temperature (SST) data, a Bayesian variant of noisy principal component analysis (BaNPCA) was developed to incorporate observation uncertainty when performing PCA. The algorithm was first assessed using synthetic data sets. Comparison of BaNPCA results with current PCA techniques showed that BaNPCA has lower data reconstruction error; that is, for a given number of principal components, it explains more variance in SST data. Using the automatic relevance determination method, BaNPCA could correctly identify the appropriate number of principal components in the data. BaNPCA was shown to exhibit distinct advantages in filling missing values in the data when compared to existing methods. In addition, the extracted principal vectors from BaNPCA were found to be smoother and more representative of large‐scale signals like El Niño–Southern Oscillation and Pacific Decadal Oscillation. To classify extreme states of all India summer monsoon rainfall, we used robust optimization that utilizes the PCs along with computed uncertainty from BaNPCA algorithm as inputs, thus engaging uncertainty in data. Results from this study demonstrate the value of utilizing uncertainty information available with hydrologic data sets.