Proteins interact with numerous water molecules to perform their physiological functions in biological organisms. Most water molecules act as solvent media; hence, their roles may be considered implicitly in theoretical treatments of protein structure and function. However, some water molecules interact intimately with proteins and require explicit treatment to understand their effects. Most physics-based computational methods are limited in their ability to accurately locate water molecules on protein surfaces because of inaccurate energy functions. Instead of relying on an energy function, this study attempts to learn the locations of water molecules from structural data. GalaxyWater-convolutional neural network (CNN) predicts water positions on protein chains, protein–protein interfaces, and protein–compound binding sites using a 3D-CNN model that is trained to generate a water score map on a given protein structure. The training data are compiled from high-resolution protein crystal structures resolved together with water molecules. GalaxyWater-CNN shows improved water prediction performance both in the coverage of crystal water molecules and in the accuracy of the predicted water positions when compared with previous energy-based methods. This method shows a superior performance in predicting water molecules that form hydrogen-bond networks precisely. The web service and the source code of this water prediction method are freely available at https://galaxy.seoklab.org/gwcnn and https://github.com/seoklab/GalaxyWater-CNN, respectively.
Abstract Low-density lipoprotein receptor-related protein 6 (LRP6) is a coreceptor of the β-catenin-dependent Wnt signaling pathway. The LRP6 ectodomain binds Wnt proteins, as well as Wnt inhibitors such as sclerostin (SOST), which negatively regulates Wnt signaling in osteocytes. Although LRP6 ectodomain 1 (E1) is known to interact with SOST, several unresolved questions remain, such as the reason why SOST binds to LRP6 E1E2 with higher affinity than to the E1 domain alone. Here, we present the crystal structure of the LRP6 E1E2–SOST complex with two interaction sites in tandem. The unexpected additional binding site was identified between the C-terminus of SOST and the LRP6 E2 domain. This interaction was confirmed by in vitro binding and cell-based signaling assays. Its functional significance was further demonstrated in vivo using Xenopus laevis embryos. Our results provide insights into the inhibitory mechanism of SOST on Wnt signaling.
The spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) presents a public health crisis, and the vaccines that can induce highly potent neutralizing antibodies are essential for ending the pandemic. The spike (S) protein on the viral envelope mediates human angiotensin-converting enzyme 2 (ACE2) binding and thus is the target of a variety of neutralizing antibodies. In this work, we built various S trimer-antibody complex structures on the basis of the fully glycosylated S protein models described in our previous work, and performed all-atom molecular dynamics simulations to get insight into the structural dynamics and interactions between S protein and antibodies. Investigation of the residues critical for S-antibody binding allows us to predict the potential influence of mutations in SARS-CoV-2 variants. Comparison of the glycan conformations between S-only and S-antibody systems reveals the roles of glycans in S-antibody binding. In addition, we explored the antibody binding modes, and the influences of antibody on the motion of S protein receptor binding domains. Overall, our analyses provide a better understanding of S-antibody interactions, and the simulation-based S-antibody interaction maps could be used to predict the influences of S mutation on S-antibody interactions, which will be useful for the development of vaccine and antibody-based therapy.
Protein-protein interactions play crucial roles in diverse biological processes, including various disease progressions. Atomistic structural details of protein-protein interactions may provide important information that can facilitate the design of therapeutic agents. GalaxyHeteromer is a freely available automatic web server (http://galaxy.seoklab.org/heteromer) that predicts protein heterodimer complex structures from two subunit protein sequences or structures. When subunit structures are unavailable, they are predicted by template- or distance-prediction-based modelling methods. Heterodimer complex structures can be predicted by both template-based and ab initio docking, depending on the template's availability. Structural templates are detected from the protein structure database based on both the sequence and structure similarities. The templates for heterodimers may be selected from monomer and homo-oligomer structures, as well as from hetero-oligomers, owing to the evolutionary relationships of heterodimers with domains of monomers or subunits of homo-oligomers. In addition, the server employs one of the best ab initio docking methods when heterodimer templates are unavailable. The multiple heterodimer structure models and the associated scores, which are provided by the web server, may be further examined by user to test or develop functional hypotheses or to design new functional molecules.
We present the quality assessment of 5613 models submitted by predictor groups from both CAPRI and CASP for the total of 15 most tractable targets from the second joint CASP-CAPRI protein assembly prediction experiment. These targets comprised 12 homo-oligomers and 3 hetero-complexes. The bulk of the analysis focuses on 10 targets (of CAPRI Round 37), which included all 3 hetero-complexes, and whose protein chains or the full assembly could be readily modeled from structural templates in the PDB. On average, 28 CAPRI groups and 10 CASP groups (including automatic servers), submitted models for each of these 10 targets. Additionally, about 16 groups participated in the CAPRI scoring experiments. A range of acceptable to high quality models were obtained for 6 of the 10 Round 37 targets, for which templates were available for the full assembly. Poorer results were achieved for the remaining targets due to the lower quality of the templates available for the full complex or the individual protein chains, highlighting the unmet challenge of modeling the structural adjustments of the protein components that occur upon binding or which must be accounted for in template-based modeling. On the other hand, our analysis indicated that residues in binding interfaces were correctly predicted in a sizable fraction of otherwise poorly modeled assemblies and this with higher accuracy than published methods that do not use information on the binding partner. Lastly, the strengths and weaknesses of the assessment methods are evaluated and improvements suggested.
Atomic-level knowledge of protein-ligand interactions allows a detailed understanding of protein functions and provides critical clues to discovering molecules regulating the functions. While recent innovative deep learning methods for protein structure prediction dramatically increased the structural coverage of the human proteome, molecular interactions remain largely unknown. A new database, HProteome-BSite, provides predictions of binding sites and ligands in the enlarged 3D human proteome. The model structures for human proteins from the AlphaFold Protein Structure Database were processed to structural domains of high confidence to maximize the coverage and reliability of interaction prediction. For ligand binding site prediction, an updated version of a template-based method GalaxySite was used. A high-level performance of the updated GalaxySite was confirmed. HProteome-BSite covers 80.74% of the UniProt entries in the AlphaFold human 3D proteome. Predicted binding sites and binding poses of potential ligands are provided for effective applications to further functional studies and drug discovery. The HProteome-BSite database is available at https://galaxy.seoklab.org/hproteome-bsite/database and is free and open to all users.