A novel index to evaluate discretization methods: A case study of flood susceptibility assessment based on random forest

2021 
Abstract The selection of a suitable discretization method (DM) to discretize spatially continuous variables (SCVs) is critical in ML-based natural hazard susceptibility assessment. However, few studies start to consider the influence due to the selected DMs and how to efficiently select a suitable DM for each SCV. These issues were well addressed in this study. The information loss rate (ILR), an index based on the information entropy, seems can be used to select optimal DM for each SCV. However, the ILR fails to show the actual influence of discretization because such index only considers the total amount of information of the discretized variables departing from the original SCV. Facing this issue, we propose an index, information change rate (ICR), that focuses on the changed amount of information due to the discretization based on each cell, enabling the identification of the optimal DM. We develop a case study with Random Forest (training/testing ratio of 7 : 3) to assess flood susceptibility in Wanan County, China. The area under the curve-based and susceptibility maps-based approaches were presented to compare the ILR and ICR. The results show the ICR-based optimal DMs are more rational than the ILR-based ones in both cases. Moreover, we observed the ILR values are unnaturally small (
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    66
    References
    1
    Citations
    NaN
    KQI
    []