Solubility Challenge revisited after 10 years, with multi-lab shake-flask data, using tight (SD ~0.17 log) and loose (SD ~0.62 log) test sets

2019 
Ten years ago we issued, in conjunction with the Journal of Chemical Information and Modeling, an open prediction challenge to the cheminformatics community. Would they be able to predict the intrinsic solubilities of 32 druglike compounds using only a high-precision set of 100 compounds as a training set? The “Solubility Challenge” was a widely recognized success and spurred many discussions about the prediction methods and quality of data. Regardless of the obvious limitations of the challenge, the conclusions were somewhat unexpected. Despite contestants employing the entire spectrum of approaches available then to predict aqueous solubility and disposing of an extremely tight data set, it was not possible to identify the best methods at predicting aqueous solubility, a variety of methods and combinations all performed equally well (or badly). Several authors have suggested since then that it is not the poor quality of the solubility data which limits the accuracy of the predictions, but the deficient ...
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    32
    References
    13
    Citations
    NaN
    KQI
    []