logo
    Protein Structure Prediction with Expectation Reflection
    0
    Citation
    43
    Reference
    10
    Related Paper
    Abstract:
    Abstract Sequence covariation in multiple sequence alignments of homologous proteins has been used extensively to obtain insights into protein structure. However, global statistical inference is required in order to ascertain direct relationships between amino acid positions in these sequences that are not simply secondary correlations induced by interactions with a third residue. Methods for statistical inference of such covariation have been developed to exploit the growing availability of sequence data. These hints about the folded protein structure provide critical a priori information for more detailed 3D predictions by neural networks. We present a novel method for protein structure inference using an iterative parameter-free model estimator which uses the formalism of statistical physics. With no tunable learning rate, our method scales to large system sizes while providing improved performance in the regime of small sample sizes. We apply this method to 40974 PDB structures and compare its performance to that of other methods. Our method outperforms existing methods for 76% of analysed proteins.
    Keywords:
    Statistical Inference
    Sequence (biology)
    secondary structure prediction is a problem related to structural bioinformatics which deals with the prediction and analysis of macromolecules i.e. DNA, RNA and protein. It is an important step towards elucidating its three dimensional structure, as well as its function. Secondary structure of a protein can be predicted from its primary structures i.e. from the amino acid sequences or from the residues though challenges exists. For these four methods are used. These are Statistical Approach, Nearest Neighbor method, Neural Network Approach and Hidden Markov Model Approach. The Artificial Neural Network (ANN) approach for prediction of protein secondary structure is the most successful one among all the methods used. In this method, ANNs are trained to make them capable of performing recognition of amino acid patterns in known secondary structure units and these patterns are used to distinguish between the different types of secondary structures. This work is related to the prediction of secondary structure of proteins employing artificial neural network though it is restricted initially to three structures only.
    Protein function prediction
    Network Structure
    Citations (10)
    Abstract Sequence covariation in multiple sequence alignments of homologous proteins has been used extensively to obtain insights into protein structure. However, global statistical inference is required in order to ascertain direct relationships between amino acid positions in these sequences that are not simply secondary correlations induced by interactions with a third residue. Methods for statistical inference of such covariation have been developed to exploit the growing availability of sequence data. These hints about the folded protein structure provide critical a priori information for more detailed 3D predictions by neural networks. We present a novel method for protein structure inference using an iterative parameter-free model estimator which uses the formalism of statistical physics. With no tunable learning rate, our method scales to large system sizes while providing improved performance in the regime of small sample sizes. We apply this method to 40974 PDB structures and compare its performance to that of other methods. Our method outperforms existing methods for 76% of analysed proteins.
    Statistical Inference
    Sequence (biology)
    Citations (0)
    Sequence (biology)
    Loop modeling
    Protein design
    Alignment-free sequence analysis
    Sequence logo
    Multiple sequence alignment
    Structural alignment
    Protein structure database
    Protein sequencing
    Citations (144)
    An analysis of the occurrence of tetrapeptides in 35 globular proteins for alpha-helix, beta-structure and coil was performed. We concluded that: the conformation of a short polypeptide segment cannot be determined on the basis of the knowledge of the amino acid sequence only; local structures of a protein are formed as the result of interactions within the whole structural domain of the protein as well as interactions with the environment.
    Globular protein
    Sequence (biology)
    Helix (gastropod)
    Alpha helix
    Coiled coil
    Loop modeling
    Citations (3)
    Local protein structure prediction efforts have consistently failed to exceed approximately 70% accuracy. We characterize the degeneracy of the mapping from local sequence to local structure responsible for this failure by investigating the extent to which similar sequence segments found in different proteins adopt similar three-dimensional structures. Sequence segments 3-15 residues in length from 154 different protein families are partitioned into neighborhoods containing segments with similar sequences using cluster analysis. The consistency of the sequence-to-structure mapping is assessed by comparing the local structures adopted by sequence segments in the same neighborhood in proteins of known structure. In the 154 families, 45% and 28% of the positions occur in neighborhoods in which one and two local structures predominate, respectively. The sequence patterns that characterize the neighborhoods in the first class probably include virtually all of the short sequence motifs in proteins that consistently occur in a particular local structure. These patterns, many of which occur in transitions between secondary structural elements, are an interesting combination of previously studied and novel motifs. The identification of sequence patterns that consistently occur in one or a small number of local structures in proteins should contribute to the prediction of protein structure from sequence.
    Sequence (biology)
    Sequence logo
    Loop modeling
    Local structure
    Degeneracy (biology)
    Structural alignment
    Citations (122)
    Abstract In spite of the fact that there has been a significant increase in the number of solved protein structures, structural information is missing for many proteins. Although structural information is codified in the amino acid sequence, computational prediction using only this information is still an unsolved problem. However, one successful method to model a protein's structure starting from the primary sequence is to use contact prediction derived from multiple sequence alignment (MSA). Here we use our contact predictor PconsC4 to generate a list of probable contacts between residues in the primary sequences. These contacts are then used together with the secondary structure prediction as constraints for the CONFOLD folding method. In this way, a 3D protein model can be built starting directly from the primary sequence. © 2019 by John Wiley & Sons, Inc.
    Sequence (biology)
    Protein sequencing
    Protein primary structure
    Folding (DSP implementation)
    CASP
    Citations (5)
    Sequence covariation in multiple sequence alignments of homologous proteins has been used extensively to obtain insights into protein structure. However, global statistical inference is required in order to ascertain direct relationships between amino acid positions in these sequences that are not simply secondary correlations induced by interactions with a third residue. Methods for statistical inference of such covariation have been developed to exploit the growing availability of sequence data. These hints about the folded protein structure provide critical a priori information for more detailed 3D predictions by neural networks. We present a novel method for protein structure inference using an iterative parameter-free model estimator which uses the formalism of statistical physics. With no tunable learning rate, our method scales to large system sizes while providing improved performance in the regime of small sample sizes. We apply this method to 40974 PDB structures and compare its performance to that of other methods. Our method outperforms existing methods for 76% of analysed proteins.
    Statistical Inference
    Sequence (biology)
    Given sufficient large protein families, and using a global statistical inference approach, it is possible to obtain sufficient accuracy in protein residue contact predictions to predict the structure of many proteins. However, these approaches do not consider the fact that the contacts in a protein are neither randomly, nor independently distributed, but actually follow precise rules governed by the structure of the protein and thus are interdependent. Here, we present PconsC2, a novel method that uses a deep learning approach to identify protein-like contact patterns to improve contact predictions. A substantial enhancement can be seen for all contacts independently on the number of aligned sequences, residue separation or secondary structure type, but is largest for β-sheet containing proteins. In addition to being superior to earlier methods based on statistical inferences, in comparison to state of the art methods using machine learning, PconsC2 is superior for families with more than 100 effective sequence homologs. The improved contact prediction enables improved structure prediction.
    Statistical Inference
    Citations (163)
    Sequence (biology)
    Loop modeling
    Multiple sequence alignment
    Structural alignment
    Protein sequencing
    Alignment-free sequence analysis
    Citations (0)