Protein Structure Prediction with Expectation Reflection

bioRxiv (Cold Spring Harbor Laboratory) (2022)

Evan Cresswell-Clay Danh-Tai Hoang Joe McKenna Chris Yang Éric Zhang Vipul Periwal

Citation

Reference

Related Paper

Abstract:

Abstract Sequence covariation in multiple sequence alignments of homologous proteins has been used extensively to obtain insights into protein structure. However, global statistical inference is required in order to ascertain direct relationships between amino acid positions in these sequences that are not simply secondary correlations induced by interactions with a third residue. Methods for statistical inference of such covariation have been developed to exploit the growing availability of sequence data. These hints about the folded protein structure provide critical a priori information for more detailed 3D predictions by neural networks. We present a novel method for protein structure inference using an iterative parameter-free model estimator which uses the formalism of statistical physics. With no tunable learning rate, our method scales to large system sizes while providing improved performance in the regime of small sample sizes. We apply this method to 40974 PDB structures and compare its performance to that of other methods. Our method outperforms existing methods for 76% of analysed proteins.

Keywords:

Statistical Inference

Sequence (biology)

Topics:

Protein Structure and Dynamics

Machine Learning in Bioinformatics

Metabolomics and Mass Spectrometry Studies

10.1101/2022.07.12.499755

Cite

PDF

Protein Structure Prediction using Artificial Neural Network

Hemashree Bordoloi Kandarpa Kumar Sarma

secondary structure prediction is a problem related to structural bioinformatics which deals with the prediction and analysis of macromolecules i.e. DNA, RNA and protein. It is an important step towards elucidating its three dimensional structure, as well as its function. Secondary structure of a protein can be predicted from its primary structures i.e. from the amino acid sequences or from the residues though challenges exists. For these four methods are used. These are Statistical Approach, Nearest Neighbor method, Neural Network Approach and Hidden Markov Model Approach. The Artificial Neural Network (ANN) approach for prediction of protein secondary structure is the most successful one among all the methods used. In this method, ANNs are trained to make them capable of performing recognition of amino acid patterns in known secondary structure units and these patterns are used to distinguish between the different types of secondary structures. This work is related to the prediction of secondary structure of proteins employing artificial neural network though it is restricted initially to three structures only.

Protein function prediction

Network Structure

Source

Cite

Citations (10)

Improvements in protein secondary structure prediction by an enhanced neural network

Journal of Molecular Biology (1990)

D G Kneller Fred E. Cohen Robert Langridge

Network Structure

10.1016/0022-2836(90)90154-e

Cite

Citations (708)

Protein Structure Prediction with Expectation Reflection

bioRxiv (Cold Spring Harbor Laboratory) (2022)

Evan Cresswell-Clay Danh-Tai Hoang Joe McKenna Chris Yang Éric Zhang

Statistical Inference

Sequence (biology)

10.1101/2022.07.12.499755

Cite

Citations (0)

Prediction of Protein Structure by Evaluation of Sequence-structure Fitness

Journal of Molecular Biology (1993)

Christos Ouzounis Chris Sander Michael E. Scharf Reinhard Schneider

Sequence (biology)

Loop modeling

Protein design

Alignment-free sequence analysis

Sequence logo

Multiple sequence alignment

Structural alignment

Protein structure database

Protein sequencing

10.1006/jmbi.1993.1433

Cite

Citations (144)

Formation of the local secondary structure of proteins: local sequence or environment.

PubMed (1986)

Danuta Płochocka Jarosław Kosiński Andrzej Rabczenko

An analysis of the occurrence of tetrapeptides in 35 globular proteins for alpha-helix, beta-structure and coil was performed. We concluded that: the conformation of a short polypeptide segment cannot be determined on the basis of the knowledge of the amino acid sequence only; local structures of a protein are formed as the result of interactions within the whole structural domain of the protein as well as interactions with the environment.

Globular protein

Sequence (biology)

Helix (gastropod)

Alpha helix

Coiled coil

Loop modeling

Source

Cite

Citations (3)

Global properties of the mapping between local amino acid sequence and local structure in proteins.

Proceedings of the National Academy of Sciences (1996)

Karen F. Han David Baker

Local protein structure prediction efforts have consistently failed to exceed approximately 70% accuracy. We characterize the degeneracy of the mapping from local sequence to local structure responsible for this failure by investigating the extent to which similar sequence segments found in different proteins adopt similar three-dimensional structures. Sequence segments 3-15 residues in length from 154 different protein families are partitioned into neighborhoods containing segments with similar sequences using cluster analysis. The consistency of the sequence-to-structure mapping is assessed by comparing the local structures adopted by sequence segments in the same neighborhood in proteins of known structure. In the 154 families, 45% and 28% of the positions occur in neighborhoods in which one and two local structures predominate, respectively. The sequence patterns that characterize the neighborhoods in the first class probably include virtually all of the short sequence motifs in proteins that consistently occur in a particular local structure. These patterns, many of which occur in transitions between secondary structural elements, are an interesting combination of previously studied and novel motifs. The identification of sequence patterns that consistently occur in one or a small number of local structures in proteins should contribute to the prediction of protein structure from sequence.

Sequence (biology)

Sequence logo

Loop modeling

Local structure

Degeneracy (biology)

Structural alignment

10.1073/pnas.93.12.5814

Cite

Citations (122)

Using PconsC4 and PconsFold2 to Predict Protein Structure

Current Protocols in Bioinformatics (2019)

Claudio Bassot David Menéndez Hurtado Arne Elofsson

Abstract In spite of the fact that there has been a significant increase in the number of solved protein structures, structural information is missing for many proteins. Although structural information is codified in the amino acid sequence, computational prediction using only this information is still an unsolved problem. However, one successful method to model a protein's structure starting from the primary sequence is to use contact prediction derived from multiple sequence alignment (MSA). Here we use our contact predictor PconsC4 to generate a list of probable contacts between residues in the primary sequences. These contacts are then used together with the secondary structure prediction as constraints for the CONFOLD folding method. In this way, a 3D protein model can be built starting directly from the primary sequence. © 2019 by John Wiley & Sons, Inc.

Sequence (biology)

Protein sequencing

Protein primary structure

Folding (DSP implementation)

CASP

10.1002/cpbi.75

Cite

Citations (5)

Protein Structure Prediction with Expectation Reflection

Authorea (Authorea) (2022)

Evan Cresswell-Clay Danh-Tai Hoang Joe McKenna Éric Zhang Vipul Periwal

Sequence covariation in multiple sequence alignments of homologous proteins has been used extensively to obtain insights into protein structure. However, global statistical inference is required in order to ascertain direct relationships between amino acid positions in these sequences that are not simply secondary correlations induced by interactions with a third residue. Methods for statistical inference of such covariation have been developed to exploit the growing availability of sequence data. These hints about the folded protein structure provide critical a priori information for more detailed 3D predictions by neural networks. We present a novel method for protein structure inference using an iterative parameter-free model estimator which uses the formalism of statistical physics. With no tunable learning rate, our method scales to large system sizes while providing improved performance in the regime of small sample sizes. We apply this method to 40974 PDB structures and compare its performance to that of other methods. Our method outperforms existing methods for 76% of analysed proteins.

Statistical Inference

Sequence (biology)

10.22541/au.165759315.54143497/v1

Cite

Citations (0)

Improved Contact Predictions Using the Recognition of Protein Like Contact Patterns

PLoS Computational Biology (2014)

Marcin J. Skwark Daniele Raimondi Mirco Michel Arne Elofsson

Given sufficient large protein families, and using a global statistical inference approach, it is possible to obtain sufficient accuracy in protein residue contact predictions to predict the structure of many proteins. However, these approaches do not consider the fact that the contacts in a protein are neither randomly, nor independently distributed, but actually follow precise rules governed by the structure of the protein and thus are interdependent. Here, we present PconsC2, a novel method that uses a deep learning approach to identify protein-like contact patterns to improve contact predictions. A substantial enhancement can be seen for all contacts independently on the number of aligned sequences, residue separation or secondary structure type, but is largest for β-sheet containing proteins. In addition to being superior to earlier methods based on statistical inferences, in comparison to state of the art methods using machine learning, PconsC2 is superior for families with more than 100 effective sequence homologs. The improved contact prediction enables improved structure prediction.

Statistical Inference

10.1371/journal.pcbi.1003889

Cite

Citations (163)

Author Index for Volume 239, 1-3

Journal of Molecular Biology (1994)

Sequence (biology)

Loop modeling

Multiple sequence alignment

Structural alignment

Protein sequencing

Alignment-free sequence analysis