True Accuracy of Fast Scoring Functions to Predict High-Throughput Screening Data from Docking Poses: The Simpler the Better.
2021
Hundreds of fast scoring functions have been developed over the last 20 years to predict binding free energies from three-dimensional structures of protein-ligand complexes. Despite numerous statistical promises, we believe that none of them has been properly validated for daily prospective high-throughput virtual screening studies, mostly because in silico screening challenges usually employ artificially built and biased datasets. We here carry out a fully unbiased evaluation of four scoring functions (Pafnucy, ΔvinaRF20, IFP, and GRIM) on an in-house developed data collection of experimental high-confidence screening data (LIT-PCBA) covering about 3 million data points on 15 diverse pharmaceutical targets. All four scoring functions were applied to rescore the docking poses of LIT-PCBA compounds in conditions mimicking exactly standard drug discovery scenarios and were compared in terms of propensity to enrich true binders in the top 1%-ranked hit lists. Interestingly, rescoring based on simple interaction fingerprints or interaction graphs outperforms state-of-the-art machine learning and deep learning scoring functions in most of the cases. The current study notably highlights the strong tendency of deep learning methods to predict affinity values within a very narrow range centered on the mean value of samples used for training. Moreover, it suggests that knowledge of pre-existing binding modes is the key to detecting the most potent binders.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
50
References
6
Citations
NaN
KQI