Development and Comparison of machine learning models for water multidimensional classification

2021 
Abstract We proposed four new models (WClassCB, WClassVL WClassVP WClassVR) for water classification using Categorical Boosting (CatBoost) and Support Vector Machines (SVM) with three kernels: linear, polynomial, and radial basis function. The new models were compared with the recently proposed WClassHLR (7 hybrid log-ratio) model based on linear discriminant analysis and canonical analysis techniques. A training database (50,000 samples) and another independent validation database (8,000 samples) of ionic charge-balanced concentrations of 4 cations ( Ca , M g , N a , and K ) and 4 anions ( SO 4 , Cl , HCO 3 , and CO 3 ) were generated through Monte Carlo simulations. The initial 16 classes were assigned from the highest cation and anion molar concentrations (GMC criteria, i.e. greater molar concentration model). Seven hybrid log-ratio transformations were used as features for training and external validation of the multidimensional classification models. These models generate probability values for each of the output classes allowing us to determine hybrid water types improving the possible water types to 256. WClassCB model showed the best accuracy values in the training set. However, WClassVL model is the recommended procedure because it generalizes better than other models in the external validation set. The new models outperform the recently proposed WClassHLR with up to a 7% difference. The usefulness of all models (WClassHLR, WClassCB, WClassVL, WClassVP, WClassVR) is illustrated by four applications to groundwater samples from India and Nigeria. All models have difficulties in classifying real samples when there is more than one major cation or anion, but they can recover the classification suggesting hybrid water types. The new computer program WaterClaSys_ML has been developed for applying these new models.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    56
    References
    1
    Citations
    NaN
    KQI
    []