A method to increase the number of positive samples for machine learning-based urban waterlogging susceptibility assessments

2021 
The frequent occurrence of urban waterlogging seriously affects people’s lives and the national economy. The use of machine learning (ML) methods to spatially assess urban waterlogging susceptibility is critical for reducing the losses caused by such disasters. It is important to select an equal number of positive and negative samples to train binary ML classifiers for evaluation; in most cases, researchers are only able to obtain a relatively small number of historical waterlogging locations (positive samples), which leads to the selection of a limited number of negative samples, further affecting the trained classifiers’ performance. Facing this issue, we proposed an optimized seed spread algorithm (OSSA) that can estimate the potential inundated areas based on the spatial distribution of elevation and natural waters, thereby increasing the number of positive samples. The primary urban area of Guangzhou, China, was selected as the study region, and random forest was selected as the evaluation algorithm. We further employed two ML methods, support vector machine and logistic regression, to verify the quality of the increased positive samples. The results indicate that compared with the original positive samples, the OSSA-based positive samples achieve the highest area under the curve values among the three tested ML methods, indicating that the OSSA can be a suitable approach to increase the number of positive samples for such studies. We believe that this study advances the ML-based waterlogging susceptibility assessments, which could be valuable for developing countries where intensive hydrologic monitoring is lacking.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    67
    References
    0
    Citations
    NaN
    KQI
    []