Protein Structure Prediction with Expectation Reflection
0
Citation
43
Reference
10
Related Paper
Abstract:
Abstract Sequence covariation in multiple sequence alignments of homologous proteins has been used extensively to obtain insights into protein structure. However, global statistical inference is required in order to ascertain direct relationships between amino acid positions in these sequences that are not simply secondary correlations induced by interactions with a third residue. Methods for statistical inference of such covariation have been developed to exploit the growing availability of sequence data. These hints about the folded protein structure provide critical a priori information for more detailed 3D predictions by neural networks. We present a novel method for protein structure inference using an iterative parameter-free model estimator which uses the formalism of statistical physics. With no tunable learning rate, our method scales to large system sizes while providing improved performance in the regime of small sample sizes. We apply this method to 40974 PDB structures and compare its performance to that of other methods. Our method outperforms existing methods for 76% of analysed proteins.Keywords:
Statistical Inference
Sequence (biology)
secondary structure prediction is a problem related to structural bioinformatics which deals with the prediction and analysis of macromolecules i.e. DNA, RNA and protein. It is an important step towards elucidating its three dimensional structure, as well as its function. Secondary structure of a protein can be predicted from its primary structures i.e. from the amino acid sequences or from the residues though challenges exists. For these four methods are used. These are Statistical Approach, Nearest Neighbor method, Neural Network Approach and Hidden Markov Model Approach. The Artificial Neural Network (ANN) approach for prediction of protein secondary structure is the most successful one among all the methods used. In this method, ANNs are trained to make them capable of performing recognition of amino acid patterns in known secondary structure units and these patterns are used to distinguish between the different types of secondary structures. This work is related to the prediction of secondary structure of proteins employing artificial neural network though it is restricted initially to three structures only.
Protein function prediction
Network Structure
Cite
Citations (10)
Network Structure
Cite
Citations (708)
Abstract Sequence covariation in multiple sequence alignments of homologous proteins has been used extensively to obtain insights into protein structure. However, global statistical inference is required in order to ascertain direct relationships between amino acid positions in these sequences that are not simply secondary correlations induced by interactions with a third residue. Methods for statistical inference of such covariation have been developed to exploit the growing availability of sequence data. These hints about the folded protein structure provide critical a priori information for more detailed 3D predictions by neural networks. We present a novel method for protein structure inference using an iterative parameter-free model estimator which uses the formalism of statistical physics. With no tunable learning rate, our method scales to large system sizes while providing improved performance in the regime of small sample sizes. We apply this method to 40974 PDB structures and compare its performance to that of other methods. Our method outperforms existing methods for 76% of analysed proteins.
Statistical Inference
Sequence (biology)
Cite
Citations (0)
Sequence (biology)
Loop modeling
Protein design
Alignment-free sequence analysis
Sequence logo
Multiple sequence alignment
Structural alignment
Protein structure database
Protein sequencing
Cite
Citations (144)
An analysis of the occurrence of tetrapeptides in 35 globular proteins for alpha-helix, beta-structure and coil was performed. We concluded that: the conformation of a short polypeptide segment cannot be determined on the basis of the knowledge of the amino acid sequence only; local structures of a protein are formed as the result of interactions within the whole structural domain of the protein as well as interactions with the environment.
Globular protein
Sequence (biology)
Helix (gastropod)
Alpha helix
Coiled coil
Loop modeling
Cite
Citations (3)
Local protein structure prediction efforts have consistently failed to exceed approximately 70% accuracy. We characterize the degeneracy of the mapping from local sequence to local structure responsible for this failure by investigating the extent to which similar sequence segments found in different proteins adopt similar three-dimensional structures. Sequence segments 3-15 residues in length from 154 different protein families are partitioned into neighborhoods containing segments with similar sequences using cluster analysis. The consistency of the sequence-to-structure mapping is assessed by comparing the local structures adopted by sequence segments in the same neighborhood in proteins of known structure. In the 154 families, 45% and 28% of the positions occur in neighborhoods in which one and two local structures predominate, respectively. The sequence patterns that characterize the neighborhoods in the first class probably include virtually all of the short sequence motifs in proteins that consistently occur in a particular local structure. These patterns, many of which occur in transitions between secondary structural elements, are an interesting combination of previously studied and novel motifs. The identification of sequence patterns that consistently occur in one or a small number of local structures in proteins should contribute to the prediction of protein structure from sequence.
Sequence (biology)
Sequence logo
Loop modeling
Local structure
Degeneracy (biology)
Structural alignment
Cite
Citations (122)
Abstract In spite of the fact that there has been a significant increase in the number of solved protein structures, structural information is missing for many proteins. Although structural information is codified in the amino acid sequence, computational prediction using only this information is still an unsolved problem. However, one successful method to model a protein's structure starting from the primary sequence is to use contact prediction derived from multiple sequence alignment (MSA). Here we use our contact predictor PconsC4 to generate a list of probable contacts between residues in the primary sequences. These contacts are then used together with the secondary structure prediction as constraints for the CONFOLD folding method. In this way, a 3D protein model can be built starting directly from the primary sequence. © 2019 by John Wiley & Sons, Inc.
Sequence (biology)
Protein sequencing
Protein primary structure
Folding (DSP implementation)
CASP
Cite
Citations (5)
Sequence covariation in multiple sequence alignments of homologous proteins has been used extensively to obtain insights into protein structure. However, global statistical inference is required in order to ascertain direct relationships between amino acid positions in these sequences that are not simply secondary correlations induced by interactions with a third residue. Methods for statistical inference of such covariation have been developed to exploit the growing availability of sequence data. These hints about the folded protein structure provide critical a priori information for more detailed 3D predictions by neural networks. We present a novel method for protein structure inference using an iterative parameter-free model estimator which uses the formalism of statistical physics. With no tunable learning rate, our method scales to large system sizes while providing improved performance in the regime of small sample sizes. We apply this method to 40974 PDB structures and compare its performance to that of other methods. Our method outperforms existing methods for 76% of analysed proteins.
Statistical Inference
Sequence (biology)
Cite
Citations (0)
Given sufficient large protein families, and using a global statistical inference approach, it is possible to obtain sufficient accuracy in protein residue contact predictions to predict the structure of many proteins. However, these approaches do not consider the fact that the contacts in a protein are neither randomly, nor independently distributed, but actually follow precise rules governed by the structure of the protein and thus are interdependent. Here, we present PconsC2, a novel method that uses a deep learning approach to identify protein-like contact patterns to improve contact predictions. A substantial enhancement can be seen for all contacts independently on the number of aligned sequences, residue separation or secondary structure type, but is largest for β-sheet containing proteins. In addition to being superior to earlier methods based on statistical inferences, in comparison to state of the art methods using machine learning, PconsC2 is superior for families with more than 100 effective sequence homologs. The improved contact prediction enables improved structure prediction.
Statistical Inference
Cite
Citations (163)
Sequence (biology)
Loop modeling
Multiple sequence alignment
Structural alignment
Protein sequencing
Alignment-free sequence analysis
Cite
Citations (0)