ADME properties evaluation in drug discovery: Prediction of plasma protein binding using NSGA-II combining PLS and consensus modeling

2017 
Abstract Plasma protein binding affinity of a drug compound has a strong influence on its pharmacodynamic behavior because it can affect the drug uptake and distribution. In this study, we collected a sizeable dataset consisting of 1830 drug compounds from several accessible sources. A descriptor pool composed of four different types of descriptors (2-D, 3-D, Estate and MACCS) was firstly built and non-dominated sorting genetic algorithm (NSGA-II) combining partial least square (PLS) regression was applied to select important descriptors for model building. Finally, we obtained a consensus model (for five-fold cross-validation: Q 2  = 0.750; RMSE = 16.151) based on five different predictive models built using random forest (RF), support vector machine (SVM), Cubist, Gaussian process (GP), and Boosting. Further, a test set and two external validation datasets were applied to validate its robustness and practicality. For the test set, R T 2  = 0.787 and RMSE T  = 14.154; when two external datasets were applied, R Ex 2  = 0.704 and 0.703, RMSE Ex  = 18.194 and 17.233 respectively. Additionally, according to OECD principles, y-randomization, Williams plot and scaffold analyses were proposed to validate the reliability and practical application domain of our predictive model. Overall, our consensus model shows a good prediction performance and generalization ability in predicting plasma protein binding (PPB). After analyzing those important descriptors selected by NSGA-II and RF, we concluded that the PPB of a drug compound is mainly related to its lipophilicity, aromatic rings, and partial charge properties. In summary, this study developed a robust and practical consensus model for PPB prediction and it could be used to the distribution evaluation and risk assessment in the early stage of drug development.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    77
    References
    10
    Citations
    NaN
    KQI
    []