Identification of function-associated loop motifs and application to protein function prediction
45
Citation
61
Reference
10
Related Paper
Citation Trend
Abstract:
Abstract Motivation: The detection of function-related local 3D-motifs in protein structures can provide insights towards protein function in absence of sequence or fold similarity. Protein loops are known to play important roles in protein function and several loop classifications have been described, but the automated identification of putative functional 3D-motifs in such classifications has not yet been addressed. This identification can be used on sequence annotations. Results: We evaluated three different scoring methods for their ability to identify known motifs from the PROSITE database in ArchDB. More than 500 new putative function-related motifs not reported in PROSITE were identified. Sequence patterns derived from these motifs were especially useful at predicting precise annotations. The number of reliable sequence annotations could be increased up to 100% with respect to standard BLAST. Contact: boliva@imim.es Supplementary information: Supplementary Data are available at Bioinformatics online.Keywords:
Protein function prediction
Identification
Sequence (biology)
Protein sequencing
Sequence motif
Predicting the function of an unknown protein is an essential goal in bioinformatics. Sequence similarity-based approaches are widely used for function prediction; however, they are often inadequate in the absence of similar sequences or when the sequence similarity among known protein sequences is statistically weak. This study aimed to develop an accurate prediction method for identifying protein function, irrespective of sequence and structural similarities.A highly accurate prediction method capable of identifying protein function, based solely on protein sequence properties, is described. This method analyses and identifies specific features of the protein sequence that are highly correlated with certain protein functions and determines the combination of protein sequence features that best characterises protein function. Thirty-three features that represent subtle differences in local regions and full regions of the protein sequences were introduced. On the basis of 484 features extracted solely from the protein sequence, models were built to predict the functions of 11 different proteins from a broad range of cellular components, molecular functions, and biological processes. The accuracy of protein function prediction using random forests with feature selection ranged from 94.23% to 100%. The local sequence information was found to have a broad range of applicability in predicting protein function.We present an accurate prediction method using a machine-learning approach based solely on protein sequence properties. The primary contribution of this paper is to propose new PNPRD features representing global and/or local differences in sequences, based on positively and/or negatively charged residues, to assist in predicting protein function. In addition, we identified a compact and useful feature subset for predicting the function of various proteins. Our results indicate that sequence-based classifiers can provide good results among a broad range of proteins, that the proposed features are useful in predicting several functions, and that the combination of our and traditional features may support the creation of a discriminative feature set for specific protein functions.
Protein function prediction
Sequence (biology)
Protein sequencing
Similarity (geometry)
Identification
Cite
Citations (51)
Abstract Motivation: The detection of function-related local 3D-motifs in protein structures can provide insights towards protein function in absence of sequence or fold similarity. Protein loops are known to play important roles in protein function and several loop classifications have been described, but the automated identification of putative functional 3D-motifs in such classifications has not yet been addressed. This identification can be used on sequence annotations. Results: We evaluated three different scoring methods for their ability to identify known motifs from the PROSITE database in ArchDB. More than 500 new putative function-related motifs not reported in PROSITE were identified. Sequence patterns derived from these motifs were especially useful at predicting precise annotations. The number of reliable sequence annotations could be increased up to 100% with respect to standard BLAST. Contact: boliva@imim.es Supplementary information: Supplementary Data are available at Bioinformatics online.
Protein function prediction
Identification
Sequence (biology)
Protein sequencing
Sequence motif
Cite
Citations (45)
Multiple sequence alignments have much to offer to the understanding of protein structure, evolution and function. We are developing approaches to use this information in predicting protein-binding specificity, intra-protein and protein-protein interactions, and in reconstructing protein interaction networks.
Protein function prediction
Sequence (biology)
Protein sequencing
Protein structure database
Multiple sequence alignment
Cite
Citations (9)
Many protein features useful for prediction of protein function can be predicted from sequence, including posttranslational modifications, subcellular localization, and physical/chemical properties. We show here that such protein features are more conserved among orthologs than paralogs, indicating they are crucial for protein function and thus subject to selective pressure. This means that a function prediction method based on sequence-derived features may be able to discriminate between proteins with different function even when they have highly similar structure. Also, such a method is likely to perform well on organisms other than the one on which it was trained. We evaluate the performance of such a method, ProtFun, which relies on protein features as its sole input, and show that the method gives similar performance for most eukaryotes and performs much better than anticipated on archaea and bacteria. From this analysis, we conclude that for the posttranslational modifications studied, both the cellular use and the sequence motifs are conserved within Eukarya.
Conserved sequence
Sequence (biology)
Protein sequencing
Protein function prediction
Sequence space
Protein methods
Cite
Citations (44)
Abstract Annotation of protein functions plays an important role in understanding life at the molecular level. High‐throughput sequencing produces massive numbers of raw proteins sequences and only about 1% of them have been manually annotated with functions. Experimental annotations of functions are expensive, time‐consuming and do not keep up with the rapid growth of the sequence numbers. This motivates the development of computational approaches that predict protein functions. A novel deep learning framework, DeepFunc, is proposed which accurately predicts protein functions from protein sequence‐ and network‐derived information. More precisely, DeepFunc uses a long and sparse binary vector to encode information concerning domains, families, and motifs collected from the InterPro tool that is associated with the input protein sequence. This vector is processed with two neural layers to obtain a low‐dimensional vector which is combined with topological information extracted from protein–protein interactions (PPIs) and functional linkages. The combined information is processed by a deep neural network that predicts protein functions. DeepFunc is empirically and comparatively tested on a benchmark testing dataset and the Critical Assessment of protein Function Annotation algorithms (CAFA) 3 dataset. The experimental results demonstrate that DeepFunc outperforms current methods on the testing dataset and that it secures the highest F max = 0.54 and AUC = 0.94 on the CAFA3 dataset.
Benchmark (surveying)
Protein sequencing
ENCODE
Protein function prediction
Sequence (biology)
Cite
Citations (92)
Abstract The prediction of a protein's function from its amino acid sequence is one of the most important tasks in bioinformatics. The traditional procedure of searching databases for related sequences and inferring the function from the best matches has several shortcomings and pitfalls. Alternatively, the sequence under study can be scrutinized for the occurrence of particular sequence signatures that can be associated with certain protein functionalities. Useful sequence signatures not only include short motifs such as protein modification sites or specific binding motifs but also encompass larger protein regions, such as homology domains. There exist a number of fundamentally different bioinformatical data structures, which can be used to store information about sequence signatures, thus making them available for the purpose of protein classification.
Sequence (biology)
Protein function prediction
Protein sequencing
Sequence logo
Sequence motif
Sequence homology
Homology
Protein structure database
Cite
Citations (0)
Protein function is mediated by different amino acid residues, both their positions and types, in a protein sequence. Some amino acids are responsible for the stability or overall shape of the protein, playing an indirect role in protein function. Others play a functionally important role as part of active or binding sites of the protein. For a given protein sequence, the residues and their degree of functional importance can be thought of as a signature representing the function of the protein. We have developed a combination of knowledge- and biophysics-based function prediction approaches to elucidate the relationships between the structural and the functional roles of individual residues and positions. Such a meta-functional signature (MFS), which is a collection of continuous values representing the functional significance of each residue in a protein, may be used to study proteins of known function in greater detail and to aid in experimental characterization of proteins of unknown function. We demonstrate the superior performance of MFS in predicting protein functional sites and also present four real-world examples to apply MFS in a wide range of settings to elucidate protein sequence-structure-function relationships. Our results indicate that the MFS approach, which can combine multiple sources of information and also give biological interpretation to each component, greatly facilitates the understanding and characterization of protein function.
Protein function prediction
Protein sequencing
Sequence (biology)
Protein design
Cite
Citations (38)
Discriminative model
Sequence motif
Protein function prediction
Sequence (biology)
Protein sequencing
Motif (music)
Conserved sequence
Cite
Citations (54)
Accurately identifying functional sites in proteins is one of the most important topics in bioinformatics and systems biology. In bioinformatics, identifying protease cleavage sites in protein sequences can aid drug/inhibitor design. In systems biology, post-translational protein-protein interaction activity is one of the major components for analyzing signaling pathway activities. Determining functional sites using laboratory experiments are normally time consuming and expensive. Computer programs have therefore been widely used for this kind of task. Mining protein sequence data using computer programs covers two major issues: 1) discovering how amino acid specificity affects functional sites and 2) discovering what amino acid specificity is. Both need a proper coding mechanism prior to using a proper machine learning algorithm. The development of the bio-basis function neural network (BBFNN) has made a new way for protein sequence data mining. The bio-basis function used in BBFNN is biologically sound in well coding biological information in protein sequences, i.e. well measuring the similarity between protein sequences. BBFNN has therefore been outperforming conventional neural networks in many subjects of protein sequence data mining from protease cleavage site prediction to disordered protein identification. This review focuses on the variants of BBFNN and their applications in mining protein sequence data.
Protein sequencing
Protein function prediction
Biological data
Cite
Citations (13)