Insights into Ensemble Learning-based Data-Driven Model for Safety-Related Property of Chemical Substances

2022 
Abstract Risk assessment relying on characteristics of chemicals in process industries can prevent accidents caused by flammable and combustible liquids and gases. Whereas its application is limited by the lack of safety-related properties for abundant chemicals of interest, which promotes the demand for accurate predictive models to evaluate inherent safety implications of chemicals. In this research, staking-based ensemble learning is comprehensively investigated on safety-related properties to assist the risk assessment. Based on molecular structure-based features, individual and ensemble models are built and compared using heterogeneous machine learning (ML) methods. The systematic ensemble learning workflow is deployed by a case on flash points of chemical substances. Several representative ML methods including multiple linear regression, extreme learning machine, feedforward neural network, and support vector machine are taken into consideration. As it turns out, ensemble models exhibit improved predictive accuracy than standard individual ML models, indicating the effectiveness of ensemble learning on improving model performance. Moreover, extremal evaluations with existing models as well as internal analyses against functional group-based organic compound families and structural feature-based data-driven categories are carried out to identify model reliability. Ensemble learning is demonstrated as an effective approach for high-performance predictive modeling in safety-related risk assessments.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    39
    References
    0
    Citations
    NaN
    KQI
    []