Groundwater Origin Determination in Historic Chemical Datasets Through Supervised Compositional Data Analysis: Brines of the Permian Basin, USA

2021 
Data from historic water quality databases often lack critical measurements necessary for focused investigations, such as determining the origin of the water. The U.S. Geological Survey produced waters database contains nearly 7,000 data of good quality for the Permian Basin, the single largest oil-producing province in the United States. However, fewer than 350 of those points contain enough geochemical data (Br concentration or δ18O and δ2H composition) to determine whether the origin of the samples is meteoric water or paleoseawater. Three supervised methods were applied to isometric and pairwise log-ratio transformed major ion data from a subset of samples of known origin but where the Br concentration and δ18O and δ2H composition were excluded to predict origin: linear discriminant analysis (isometric only), support vector machines (isometric and pairwise), and random forests (pairwise only). Error rates from validation, using data of known origin (excluding Br concentration and δ18O and δ2H composition) that were not used in model development, found that no method performed exceptionally well. An ensemble approach of only assigning classification when all three methods provide the same classification reduced the error rate of the validation data to 11% but failed to classify 28% of the data. This latter approach was applied to the nearly 7,000 samples which only contained concentrations of major ions (Cl, Ca, HCO3, Mg, Na, and SO4). Spatial mapping of these newly classified data generated insight on distribution and flow of meteoric and paleoseawater across the basin.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    29
    References
    0
    Citations
    NaN
    KQI
    []