Machine learning approach identifies water sample source based on microbial abundance.

2021 
Abstract Water quality can change along a river system due to differences in adjacent land use patterns and discharge sources. These variations can induce rapid responses of the aquatic microbial community, which may be an indicator of water quality characteristics. In the current study, we used a random forest model to predict water sample sources from three different river ecosystems along a gradient of anthropogenic disturbance (i.e., less disturbed mountainous area, wastewater discharged urban area, and pesticide and fertilizer applied agricultural area) based on environmental physicochemical indices (PCIs), microbiological indices (MBIs), and their combination. Results showed that among the PCI-based models, using conventional water quality indices as inputs provided markedly better prediction of water sample source than using pharmaceutical and personal care products (PPCPs), and much better prediction than using polycyclic aromatic hydrocarbons (PAHs) and substituted PAHs (SPAHs). Among the MBI-based models, using the abundances of the top 30 bacteria combined with pathogenic antibiotic resistant bacteria (PARB) as inputs achieved the lowest median out-of-bag error rate (9.9%) and increased median kappa coefficient (0.8694), while adding fungal inputs reduced the kappa coefficient. The model based on the top 30 bacteria still showed an advantage compared with models based on PCIs or the combination of PCIs and MBIs. With improvement in 16S rRNA sequencing technology and increase in data availability in the future, the proposed method provides an economical, rapid, and reliable way in which to identify water sample sources based on abundance data of microbial communities.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    1
    Citations
    NaN
    KQI
    []