Data for: Assessment of Several Machine Learning Methods Towards Reliable Prediction of Hormone Receptor Binding Affinity

2017 
Abstract We examined the performance of several popular machine learning methods including multiple linear regression (MLR), support vector regression (SVR) and Gaussian process regression (GPR) for quantitative predictions of estrogen receptor (ER) binding affinity. Particular attention is devoted to compiling an accurate and precise dataset of 1589 experimental binding affinities (logRBA) from the Estrogenic Activity Database to train and validate the models. Issues related to accuracy/precision of experimental data, choice of binding affinity data measured on human ER or across species are addressed. The SVR and GPR models performed the best with root-mean-square-errors below 1 log unit that are comparable to experimental precision. We further examined the use of functional group analysis, nearest-neighbour distance and confidence interval estimates to flag large outliers. This work provides a test set of precise experimental data that may be used in future benchmarking studies of other machine learning models or free energy calculations.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    31
    References
    3
    Citations
    NaN
    KQI
    []