Despite recent advances in computational protein science, the dynamic behavior of proteins, which directly governs their biological activity, cannot be gleaned from sequence information alone. To overcome this challenge, we propose a framework that integrates the peptide sequence, protein structure, and protein dynamics descriptors into machine learning algorithms to enhance their predictive capabilities and achieve improved prediction of the protein variant function. The resulting machine learning pipeline integrates traditional sequence and structure information with molecular dynamics simulation data to predict the effects of multiple point mutations on the fold improvement of the activity of bovine enterokinase variants. This study highlights how the combination of structural and dynamic data can provide predictive insights into protein functionality and address protein engineering challenges in industrial contexts.
Understanding molecular recognition of small molecules by proteins in atomistic detail is key for drug design. Molecular docking is a widely used computational method to mimic ligand–protein association in silico. However, predicting conformational changes occurring in proteins upon ligand binding is still a major challenge. Ensemble docking approaches address this issue by considering a set of different conformations of the protein obtained either experimentally or from computer simulations, e.g., molecular dynamics. However, holo structures prone to host (the correct) ligands are generally poorly sampled by standard molecular dynamics simulations of the apo protein. In order to address this limitation, we introduce a computational approach based on metadynamics simulations called ensemble docking with enhanced sampling of pocket shape (EDES) that allows holo-like conformations of proteins to be generated by exploiting only their apo structures. This is achieved by defining a set of collective variables that effectively sample different shapes of the binding site, ultimately mimicking the steric effect due to the ligand. We assessed the method on three challenging proteins undergoing different extents of conformational changes upon ligand binding. In all cases our protocol generates a significant fraction of structures featuring a low RMSD from the experimental holo geometry. Moreover, ensemble docking calculations using those conformations yielded in all cases native-like poses among the top-ranked ones.