Modeling protein-ligand interactions is a challenging task that has been approached through an array of perspectives. From physics-based computational approaches to vast deep learning pipelines, in silico methods hold promise in reducing experimental overhead in the otherwise tedious and costly drug discovery campaigns. We introduce Protein-Ligand Equivariant Transformer (ProLET), a generalizable model built upon chemically inspired SE(3) equivariant geometric deep learning. We evaluate ProLET on a wide range of established standards, including the notoriously difficult PoseBusters and Merck’s FEP benchmarks, consistently demonstrating superior performance in binding affinity prediction and pose estimation. We demonstrate its effectiveness across different stages in drug discovery, showing that ProLET can be used for lead optimization and hit identification as well as for prioritizing compounds that are selective towards a desired target. By bridging the gap between accuracy, efficiency, and generalizability, ProLET stands as a powerful and adaptive resource, signifying a step towards safe and reliable AI-driven drug discovery.
In this study, we synergistically integrate Ro5's target evaluation (SpectraView) and deep-learning-driven virtual screening (HydraScreen) tools with Strateos automated robotic cloud lab optimized for ultra high-throughput screening, to experimentally validate Ro5's tools. This integrated approach leads to a significant acceleration in the processes of target identification and hit discovery. By using SpectraView to select IRAK1 as the focal point of our investigation, we prospectively validate HydraScreen structure-based deep learning model. We can achieve the identification of an 23.8% of all IRAK1 hits within the top 1% of ranked compounds. HydraScreen also outperforms traditional virtual screening techniques and offers advanced features such as ligand pose confidence scoring. Simultaneously, we identify three potent (nanomolar) scaffolds from our compound library, two of which represent novel candidates for IRAK1 and hold potential for future development. Our platforms and innovative tools promise to expedite the early stages of drug discovery.
Background: Target identification and hit identification can be transformed through the application of biomedical knowledge analysis, AI-driven virtual screening and robotic cloud lab systems. However there are few prospective studies that evaluate the efficacy of such integrated approaches. Results: We synergistically integrate our in-house-developed target evaluation (SpectraView) and deep-learning-driven virtual screening (HydraScreen) tools with an automated robotic cloud lab designed explicitly for ultra-high-throughput screening, enabling us to validate these platforms experimentally. By employing our target evaluation tool to select IRAK1 as the focal point of our investigation, we prospectively validate our structure-based deep learning model. We can identify 23.8% of all IRAK1 hits within the top 1% of ranked compounds. The model outperforms traditional virtual screening techniques and offers advanced features such as ligand pose confidence scoring. Simultaneously, we identify three potent (nanomolar) scaffolds from our compound library, 2 of which represent novel candidates for IRAK1 and hold promise for future development. Conclusion: This study provides compelling evidence for SpectraView and HydraScreen to provide a significant acceleration in the processes of target identification and hit discovery. By leveraging Ro5's HydraScreen and Strateos' automated labs in hit identification for IRAK1, we show how AI-driven virtual screening with HydraScreen could offer high hit discovery rates and reduce experimental costs. Scientific contribution: We present an innovative platform that leverages Knowledge graph-based biomedical data analytics and AI-driven virtual screening integrated with robotic cloud labs. Through an unbiased, prospective evaluation we show the reliability and robustness of HydraScreen in virtual and high-throughput screening for hit identification in IRAK1. Our platforms and innovative tools can expedite the early stages of drug discovery.
We propose HydraScreen, a deep-learning approach that aims to provide a framework for more robust machine-learning-accelerated drug discovery. HydraScreen utilizes a state-of-the-art 3D convolutional neural network, designed for the effective representation of molecular structures and interactions in protein-ligand binding. We design an end-to-end pipeline for high-throughput screening and lead optimization, targeting applications in structure-based drug design. We assess our approach using established public benchmarks based on the CASF 2016 core set, achieving top-tier results in affinity and pose prediction (Pearson's r = 0.86, RMSE = 1.15, Top-1 = 0.95). Furthermore, we utilize a novel interaction profiling approach to identify potential biases in the model and dataset to boost interpretability and support the unbiased nature of our method. Finally, we showcase HydraScreen's capacity to generalize across unseen proteins and ligands, offering directions for future development of robust machine learning scoring functions. HydraScreen (accessible at https://hydrascreen.ro5.ai) provides a user-friendly GUI and a public API, facilitating easy assessment of individual protein-ligand complexes.
In this study, we integrate Ro5’s target evaluation SpectraView and DL-driven virtual screening HydraScreen tools alongside Strateos' robotic cloud labs high-throughput screening platform to accelerate target and hit identification. Using SpectraView to select IRAK1 as the target, we prospectively validate HydraScreen, a structure-based deep learning model. We demonstrate that HydraScreen could identify up to 23.8% of all IRAK1 hits in the top 1% of the ranked compounds, simultaneously identifying the three most potent (nanomolar) scaffolds present in the library. The three nanomolar scaffolds identified in our project are novel for IRAK1 and lend themselves for future development. HydraScreen outperforms traditional virtual screening methods in an unbiased prospective evaluation and offers advanced features such as ligand pose confidence scoring. Thus, SpectraView and HydraScreen are innovative tools which can aid and expedite early stages of drug discovery.
Abstract Background Target identification and hit identification can be transformed through the application of biomedical knowledge analysis, AI-driven virtual screening and robotic cloud lab systems. However there are few prospective studies that evaluate the efficacy of such integrated approaches. Results We synergistically integrate our in-house-developed target evaluation (SpectraView) and deep-learning-driven virtual screening (HydraScreen) tools with an automated robotic cloud lab designed explicitly for ultra-high-throughput screening, enabling us to validate these platforms experimentally. By employing our target evaluation tool to select IRAK1 as the focal point of our investigation, we prospectively validate our structure-based deep learning model. We can identify 23.8% of all IRAK1 hits within the top 1% of ranked compounds. The model outperforms traditional virtual screening techniques and offers advanced features such as ligand pose confidence scoring. Simultaneously, we identify three potent (nanomolar) scaffolds from our compound library, 2 of which represent novel candidates for IRAK1 and hold promise for future development. Conclusion This study provides compelling evidence for SpectraView and HydraScreen to provide a significant acceleration in the processes of target identification and hit discovery. By leveraging Ro5’s HydraScreen and Strateos’ automated labs in hit identification for IRAK1, we show how AI-driven virtual screening with HydraScreen could offer high hit discovery rates and reduce experimental costs. Scientific contribution We present an innovative platform that leverages Knowledge graph-based biomedical data analytics and AI-driven virtual screening integrated with robotic cloud labs. Through an unbiased, prospective evaluation we show the reliability and robustness of HydraScreen in virtual and high-throughput screening for hit identification in IRAK1. Our platforms and innovative tools can expedite the early stages of drug discovery.
We propose HydraScreen, a deep-learning approach that aims to provide a framework for more robust machine-learning-accelerated drug discovery. HydraScreen utilizes a state-of-the-art 3D convolutional neural network, designed for the effective representation of molecular structures and interactions in protein-ligand binding. We design an end-to-end pipeline for high-throughput screening and lead optimization, targeting applications in structure-based drug design. We assess our approach using established public benchmarks based on the CASF 2016 core set, achieving top-tier results in affinity and pose prediction (Pearson's r = 0.86, RMSE = 1.15, Top-1 = 0.95). Furthermore, we present a novel interaction profiling approach to identify potential biases in the model and dataset to boost interpretability and support the unbiased nature of our method. Finally, we showcase HydraScreen's capacity to generalize across unseen proteins and ligands, offering directions for future development of robust machine learning scoring functions. HydraScreen, accessible at https://hydrascreen.ro5.ai, provides a user-friendly GUI and a public API, facilitating easy assessment of individual protein–ligand complexes.
We propose HydraScreen, a deep-learning framework for safe and robust accelerated drug discovery. HydraScreen utilizes a state-of-the-art 3D convolutional neural network designed for the effective representation of molecular structures and interactions in protein-ligand binding. We designed an end-to-end pipeline for high-throughput screening and lead optimization, targeting applications in structure-based drug design. We assessed our approach using established public benchmarks based on the CASF-2016 core set, achieving top-tier results in affinity and pose prediction (Pearson's
This study, focusing on predicting Absorption, Distribution, Metabolism, Excretion, and Toxicology (ADMET) properties, addresses the key challenges of ML models trained using ligand-based representations. We propose a structured approach to data feature selection, taking a step beyond the conventional practice of combining different representations without systematic reasoning. Additionally, we enhance model evaluation methods by integrating cross-validation with statistical hypothesis testing, adding a layer of reliability to the model assessments. Our final evaluations include a practical scenario, where models trained on one source of data are evaluated on a different one. This approach aims to bolster the reliability of ADMET predictions, providing more dependable and informative model evaluations.