An empirical approach for structure-based prediction of carbohydrate-binding sites on proteins

Clara Shionyu-Mitsuyama,Tsuyoshi Shirai,Hirokazu Ishida,Takashi Yamane

An empirical approach for structure-based prediction of carbohydrate-binding sites on proteins

2003

A computer program system was developed to predictcarbohydrate-binding sites on three-dimensional (3D)protein structures. The programs search for binding sitesby referring to the empirical rules derived from theknown 3D structures of carbohydrate–protein complexes.A total of 80 non-redundant carbohydrate–protein com-plex structures were selected from the Protein Data Bankfor the empirical rule construction. The performance ofthe prediction system was tested on 50 known complexstructures to determine whether the system could detectthe known binding sites. The known monosaccharide-binding sites were detected among the best three predic-tions in 59% of the cases, which covered 69% of the poly-saccharide-binding sites in the target proteins, when theperformance was evaluated by the overlap between resi-due patches of predicted and known binding sites.Keywords: computer application/drug design/ligand prediction/molecular interaction/sugarIntroductionThe number of known three-dimensional (3D) protein struc-tures has been increasing rapidly, and the rate of increase willbe further enhanced by the current structural genomics projects(Berman et al., 2000; Yokoyama et al., 2000; Westbrook et al.,2003; Zhang and Kim, 2003). Thus, many 3D protein structureslacking functional details are being accumulated. Therefore,methods for deriving functional information from 3D proteinstructures are required. One of the most important pieces ofinformation that 3D protein structures can confer is themechanism of molecular interaction. The prediction of afunction from a 3D protein structure should be based on aprediction of the ligand and the mode of interaction with theligand.Polysaccharides (carbohydrates) are often referred to as thethird molecular chain of life. The functional roles ofpolysaccharides and their interactions with proteins are draw-ing more attention than before, since it has been recognizedthat carbohydrates are used as information carriers rather thansimple storage material. Carbohydrate–protein interactions areinvolved in a variety of biological activities, including immuneresponses, cell–cell recognition and cell adhesion (Weis et al.,1992; Feizi, 1993; Lis and Sharon, 1998; Kogelberg and Feizi,2001). Since polysaccharides assume a large variety ofconﬁgurations, their potential for information encoding mightbe greater than that of peptides or nucleotides (Laine, 1994).Therefore, many carbohydrate-binding proteins are beingconsidered as targets for new medicines (Beuth et al., 1994;Axford, 1997). Computer-aided predictions of carbohydrate–protein interactions might facilitate the rational design of drugswith activities against the proteins.Although carbohydrate–protein interactions have been ana-lyzed in several studies of the mechanisms of protein–carbohydrate recognition (Rini, 1995; Weis and Drickamer,1996; Elgavish and Shaanan, 1997; Rao et al., 1998; Garcia-Hernandez and Hernandez-Arana, 1999; Garcia-Hernandezet al., 2000; Clarke et al., 2001; Neumann et al., 2002), onlyone computer application has been reported that is speciﬁcallyaimed at predicting carbohydrate-binding sites (Taroni et al.,2000). Compared with the abundance of methodologiesdeveloped for protein–nucleic acid (Kono and Sarai, 1999) orprotein–protein interactions (Jones and Thornton, 1997a,b;Ishida et al., 2000), there are still very few methods forpredicting carbohydrate–protein interactions.The rapid expansion of the structure databases provides anopportunity to use knowledge-based approaches for prediction.Several hundred structures of carbohydrate–protein complexesare currently available in the Protein Data Bank (PDB) (Zhangand Kim, 2003). The previously reported system used thestatistics of amino acid propensity at carbohydrate-bindingsites (Taroni et al., 2000). Patches of residues on a test proteinmolecule were ranked by the average of the propensity scoreover the residues. In 63% of the test proteins, at least one of thebest three predicted patches was found to have considerableoverlap with the real one. This result demonstrated that thepatch and propensity approach was valid for prediction.It is known that certain amino acid residues in carbohydrate-binding sites show characteristic spatial distributions aroundsaccharide moieties (Drickamer, 1992; Iobst and Drickamer,1994; Rini, 1995; Kolatkar and Weis, 1996; Weis andDrickamer, 1996; Elgavish and Shaanan, 1997; Taroni et al.,2000). This suggests that the coordinates of the carbohydrate-binding residues can be explicitly used for predictions. In thisstudy, a program system has been developed that uses theempirical rules of the spatial distribution of protein atoms atknown carbohydrate-binding sites for prediction, and theperformance of the system was tested on 50 knowncarbohydrate–protein complexes.Materials and methodsOverview of the prediction systemA schematic overview of the prediction system is shown inFigure 1. The system consists of two components: the programsfor the construction of empirical rules and those for the sugar-binding-site search. The former programs require a set ofknown 3D carbohydrate–protein complex structures as the

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations