Local structure based method for prediction of the biochemical function of proteins: Applications to glycoside hydrolases

2016 
Abstract Thousands of protein structures of unknown or uncertain function have been reported as a result of high-throughput structure determination techniques developed by Structural Genomics (SG) projects. However, many of the putative functional assignments of these SG proteins in the Protein Data Bank (PDB) are incorrect. While high-throughput biochemical screening techniques have provided valuable functional information for limited sets of SG proteins, the biochemical functions for most SG proteins are still unknown or uncertain. Therefore, computational methods for the reliable prediction of protein function from structure can add tremendous value to the existing SG data. In this article, we show how computational methods may be used to predict the function of SG proteins, using examples from the six-hairpin glycosidase (6-HG) and the concanavalin A-like lectin/glucanase (CAL/G) superfamilies. Using a set of predicted functional residues, obtained from computed electrostatic and chemical properties for each protein structure, it is shown that these superfamilies may be sorted into functional families according to biochemical function. Within these superfamilies, a total of 18 SG proteins were analyzed according to their predicted, local functional sites: 13 from the 6-HG superfamily, five from the CAL/G superfamily. Within the 6-HG superfamily, an uncharacterized protein BACOVA_03626 from Bacteroides ovatus (PDB 3ON6 ) and a hypothetical protein BT3781 from Bacteroides thetaiotaomicron (PDB 2P0V ) are shown to have very strong active site matches with exo-α-1,6-mannosidases, thus likely possessing this function. Also in this superfamily, it is shown that protein BH0842, a putative glycoside hydrolase from Bacillus halodurans (PDB 2RDY ), has a predicted active site that matches well with a known α- l -galactosidase. In the CAL/G superfamily, an uncharacterized glycosyl hydrolase family 16 protein from Mycobacterium smegmatis (PDB 3RQ0 ) is shown to have local structural similarity at the predicted active site with the known members of the GH16 family, with the closest match to the endoglucanase subfamily. The method discussed herein can predict whether an SG protein is correctly or incorrectly annotated and can sometimes provide a reliable functional annotation. Examples of application of the method across folds, comparing active sites between two proteins of different structural folds, are also given.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    59
    References
    4
    Citations
    NaN
    KQI
    []