logo
    PFP-GO: Integrating protein sequence, domain and protein-protein interaction information for protein function prediction using ranked GO terms
    5
    Citation
    87
    Reference
    10
    Related Paper
    Citation Trend
    Abstract:
    Protein function prediction is gradually emerging as an essential field in biological and computational studies. Though the latter has clinched a significant footprint, it has been observed that the application of computational information gathered from multiple sources has more significant influence than the one derived from a single source. Considering this fact, a methodology, PFP-GO, is proposed where heterogeneous sources like Protein Sequence, Protein Domain, and Protein-Protein Interaction Network have been processed separately for ranking each individual functional GO term. Based on this ranking, GO terms are propagated to the target proteins. While Protein sequence enriches the sequence-based information, Protein Domain and Protein-Protein Interaction Networks embed structural/functional and topological based information, respectively, during the phase of GO ranking. Performance analysis of PFP-GO is also based on Precision, Recall, and F-Score. The same was found to perform reasonably better when compared to the other existing state-of-art. PFP-GO has achieved an overall Precision, Recall, and F-Score of 0.67, 0.58, and 0.62, respectively. Furthermore, we check some of the top-ranked GO terms predicted by PFP-GO through multilayer network propagation that affect the 3D structure of the genome. The complete source code of PFP-GO is freely available at https://sites.google.com/view/pfp-go/ .
    Keywords:
    Sequence (biology)
    Protein function prediction
    F1 score
    Protein sequencing
    Interaction network
    Footprint
    High-throughput experimental technologies in protein interaction continue to alter the study of current system biology, and a large-scale data can be available. Proteinprotein interactions on these experimental platforms, however, present numerous production and bioinformatics challenges. Some issues like the functional modules identification, protein complexes prediction, protein function prediction and diseaserelated gene prioritization have become increasingly problematic in the analysis of protein-protein interaction networks. The development of powerful, efficient prediction methods for the structure and function analysis of protein interaction network is critical for the research community to accelerate research and publications. Currently, Network-based approaches are drawing the most attention in analyzing protein interactions.This review aims to describe the-state-of-art of network-based strategies and applications to infer protein interactions.
    Protein Interaction Networks
    Protein function prediction
    Identification
    Interaction network
    Prioritization
    Citations (0)
    Predicting protein function is one of the most challenging problems of the post-genomic era. The development of experimental methods for genome scale analysis of molecular interaction networks has provided new approaches to inferring protein function. There are various approaches available for deducing the function of proteins of unknown function using protein information. In this paper, the reliable methods for assigning protein function are given based on the network of physical interactions. The characteristics of the method are: Function assignment is proteome-wide and is determined by the global connectivity pattern of the protein network. To validate the method, the yeast Saccharomyces cerevisiae protein-protein interaction network is analyzed. Comparing with the current protein function prediction based on network, our method can improve the quality of prediction substantially with multiple data sources. The precision has achieved 82% in the stringent functional classification and 96% in the less detailed classification.
    Protein function prediction
    Proteome
    Interaction network
    Protein Interaction Networks
    Citations (0)
    Predicting the function of an unknown protein is an essential goal in bioinformatics. Sequence similarity-based approaches are widely used for function prediction; however, they are often inadequate in the absence of similar sequences or when the sequence similarity among known protein sequences is statistically weak. This study aimed to develop an accurate prediction method for identifying protein function, irrespective of sequence and structural similarities.A highly accurate prediction method capable of identifying protein function, based solely on protein sequence properties, is described. This method analyses and identifies specific features of the protein sequence that are highly correlated with certain protein functions and determines the combination of protein sequence features that best characterises protein function. Thirty-three features that represent subtle differences in local regions and full regions of the protein sequences were introduced. On the basis of 484 features extracted solely from the protein sequence, models were built to predict the functions of 11 different proteins from a broad range of cellular components, molecular functions, and biological processes. The accuracy of protein function prediction using random forests with feature selection ranged from 94.23% to 100%. The local sequence information was found to have a broad range of applicability in predicting protein function.We present an accurate prediction method using a machine-learning approach based solely on protein sequence properties. The primary contribution of this paper is to propose new PNPRD features representing global and/or local differences in sequences, based on positively and/or negatively charged residues, to assist in predicting protein function. In addition, we identified a compact and useful feature subset for predicting the function of various proteins. Our results indicate that sequence-based classifiers can provide good results among a broad range of proteins, that the proposed features are useful in predicting several functions, and that the combination of our and traditional features may support the creation of a discriminative feature set for specific protein functions.
    Protein function prediction
    Sequence (biology)
    Protein sequencing
    Similarity (geometry)
    Identification
    Citations (51)
    Abstract Motivation: The detection of function-related local 3D-motifs in protein structures can provide insights towards protein function in absence of sequence or fold similarity. Protein loops are known to play important roles in protein function and several loop classifications have been described, but the automated identification of putative functional 3D-motifs in such classifications has not yet been addressed. This identification can be used on sequence annotations. Results: We evaluated three different scoring methods for their ability to identify known motifs from the PROSITE database in ArchDB. More than 500 new putative function-related motifs not reported in PROSITE were identified. Sequence patterns derived from these motifs were especially useful at predicting precise annotations. The number of reliable sequence annotations could be increased up to 100% with respect to standard BLAST. Contact: boliva@imim.es Supplementary information: Supplementary Data are available at Bioinformatics online.
    Protein function prediction
    Identification
    Sequence (biology)
    Protein sequencing
    Sequence motif
    Multiple sequence alignments have much to offer to the understanding of protein structure, evolution and function. We are developing approaches to use this information in predicting protein-binding specificity, intra-protein and protein-protein interactions, and in reconstructing protein interaction networks.
    Protein function prediction
    Sequence (biology)
    Protein sequencing
    Protein structure database
    Multiple sequence alignment
    Citations (9)
    Many protein features useful for prediction of protein function can be predicted from sequence, including posttranslational modifications, subcellular localization, and physical/chemical properties. We show here that such protein features are more conserved among orthologs than paralogs, indicating they are crucial for protein function and thus subject to selective pressure. This means that a function prediction method based on sequence-derived features may be able to discriminate between proteins with different function even when they have highly similar structure. Also, such a method is likely to perform well on organisms other than the one on which it was trained. We evaluate the performance of such a method, ProtFun, which relies on protein features as its sole input, and show that the method gives similar performance for most eukaryotes and performs much better than anticipated on archaea and bacteria. From this analysis, we conclude that for the posttranslational modifications studied, both the cellular use and the sequence motifs are conserved within Eukarya.
    Conserved sequence
    Sequence (biology)
    Protein sequencing
    Protein function prediction
    Sequence space
    Protein methods
    Citations (44)
    Abstract The prediction of a protein's function from its amino acid sequence is one of the most important tasks in bioinformatics. The traditional procedure of searching databases for related sequences and inferring the function from the best matches has several shortcomings and pitfalls. Alternatively, the sequence under study can be scrutinized for the occurrence of particular sequence signatures that can be associated with certain protein functionalities. Useful sequence signatures not only include short motifs such as protein modification sites or specific binding motifs but also encompass larger protein regions, such as homology domains. There exist a number of fundamentally different bioinformatical data structures, which can be used to store information about sequence signatures, thus making them available for the purpose of protein classification.
    Sequence (biology)
    Protein function prediction
    Protein sequencing
    Sequence logo
    Sequence motif
    Sequence homology
    Homology
    Protein structure database
    Information of protein subcellular localization is indispensable to study protein function, as a protein can perform its function only after it is correctly transported to a specific subcellular compartment. Thus it is very important to provide accurate prediction of protein subcellular localization in biological studies. In contrast to sequence features(e.g. amino acids composition) that are widely used in subcellular localization prediction, features extracting protein-protein interaction(PPI) are largely ignored, although they reflect the co-localization information of different proteins. In this study, we propose a novel distance formula based on both protein sequence and PPI features, which precisely measures the similarity of proteins by incorporating protein information including amino acid composition, PPI and the corresponding interaction scores. Based on this distance formula, we further introduce a k-nearest neighbor(KNN) algorithm for predicting subcellular localization. The results of leave-one-out test on a benchmark dataset show that PPI features significantly improve the performance of protein subcellular localization. Meanwhile, this KNN algorithm also outperformes SVM algorithm adopting the same features, suggesting the efficiency of the proposed algorithm for predicting protein subcellular localization.
    Protein sequencing
    Protein function prediction
    Benchmark (surveying)
    Sequence (biology)
    Citations (0)
    Accurately identifying functional sites in proteins is one of the most important topics in bioinformatics and systems biology. In bioinformatics, identifying protease cleavage sites in protein sequences can aid drug/inhibitor design. In systems biology, post-translational protein-protein interaction activity is one of the major components for analyzing signaling pathway activities. Determining functional sites using laboratory experiments are normally time consuming and expensive. Computer programs have therefore been widely used for this kind of task. Mining protein sequence data using computer programs covers two major issues: 1) discovering how amino acid specificity affects functional sites and 2) discovering what amino acid specificity is. Both need a proper coding mechanism prior to using a proper machine learning algorithm. The development of the bio-basis function neural network (BBFNN) has made a new way for protein sequence data mining. The bio-basis function used in BBFNN is biologically sound in well coding biological information in protein sequences, i.e. well measuring the similarity between protein sequences. BBFNN has therefore been outperforming conventional neural networks in many subjects of protein sequence data mining from protease cleavage site prediction to disordered protein identification. This review focuses on the variants of BBFNN and their applications in mining protein sequence data.
    Protein sequencing
    Protein function prediction
    Biological data
    Citations (13)