Nonlinear QSAR models with high-dimensional descriptor selection and SVR improve toxicity prediction and evaluation of phenols on Photobacterium phosphoreum

2015 
Abstract Assessment of the risk of chemicals is an important task in the environmental protection. In this paper, we developed quantitative structure–activity relationship (QSAR) methods to evaluate the toxicity of phenol to Photobacterium phosphoreum , which is an important indicator for water quality. We first built support vector regression (SVR) model using three descriptors, and the SVR model ( t  = 2) had the highest external prediction ability ( MSE ext  = 0.068, Q ext 2  = 0.682), about 40% higher than literature model's. Second, to identify more effective descriptors, we applied in-house methods to select descriptors with clear meanings from 2835 descriptors calculated by the PCLIENT and used them to construct the SVR models. Our results showed that our twenty new QSAR models significantly increased the standard regression coefficient on test set ( MSE ext values ranged from 0.003 to 0.063 and Q ext 2 values ranged from 0.708 to 0.985). The Y random response permutation test and different splits of training/test datasets also supported the excellent predictive power of the best SVR model. We further evaluated the regression significance of our SVR model and the importance of each single descriptor of the model according to the interpretability analysis. Our work provided useful theoretical understanding of the toxicity of phenol analogues.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    39
    References
    8
    Citations
    NaN
    KQI
    []