Investigating the world of protein design
2016
In this thesis, methods are developed for the design of protein structures based only on abstract structural descriptions of the protein fold. The design protocol starts with rough alpha-carbon (backbone) models and progress through several stages of high-resolution refinement, sequence design and filtering according to known principles of protein structure, coarse-grained and high-resolution knowledge-based potentials and secondary and tertiary structural prediction methods. Following this protocol led to the identification of protein solubility as a major limiting factor in the progression of designs. To overcome this problem, a systematic analysis was undertaken focusing on the more restricted problem of sequence redesign of the native structure to find common principles that govern viable protein sequences that can then be applied to the design of novel structures. A factorial design of experiments approach was used to screen a multitude of sequence redesigns for known backbones that each possess a unique set of properties. The behaviour of each redesign was characterised when expressed in E. Coli in the hopes of elucidating some common features indicative of a viable sequence. Even with the use of fractional factorial design, the number of experimental designs required was limiting and to test the approach, we adopted an ab initio prediction method as a proxy for the "wet" experiments. This allowed us to increase the information gain for each experiment and we found that optimizing towards secondary structure prediction had a negative effect on the predictability of sequences. Following this, an improved design methodology was developed that uses a genetic algorithm to produce sequence redesigns which look realistic to a number of computational measures such as sequence composition, both ab initio and comparative modelling prediction methods, and have a high degree of native sequence recapitulation (a good indicator for a viable design method). Although these state-of-the-art measures all point toward 3 our sequence designs being on-par with (or better than) native sequences, the problem of solubility and foldedness remained. Machine learning techniques were used to unravel some of the complex intricacies governing these two properties. Using this approach, we discovered features that a substantial portion of native sequences comply with but were missing from our designs. Using the genetic algorithm design method, we directed our sequences to this area of attribute space in the hopes that this would produce more realistic designs. Unfortunately, solubility of designs still remained a large problem. We also tested the relatively new technique of convergent peptide synthesis to understand how valuable it could be in a synthetic biology context. After designing a single Leucine-rich repeat, we used peptide chemistry to synthetically build the protein before investigating solubility and foldedness. Upon producing a single repeat of this designed protein, multiple repeats were linked together and the resulting properties characterised. A single 28-mer peptide was produced that was soluble and folded.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
0
References
0
Citations
NaN
KQI