A library of recombinant glutathione transferases (GSTs) generated by shuffling of DNA encoding human GST M1-1 and GST M2-2 was screened with eight alternative substrates, and the activities were subjected to multivariate analysis. Assays were made in lysates of bacteria in which the GST variants had been expressed. The primary data showed clustering of the activities in eight-dimensional substrate-activity space. For an incisive analysis, the rows of the data matrix, corresponding to the different enzyme variants, were individually scaled to unit length, thus accounting for different expression levels of the enzymes. The columns representing the activities with alternative substrates were subsequently individually normalized to unit variance and a zero mean. By this standardization, the data were adjusted to comparable orders of magnitude. Three molecular quasi-species were recognized by multivariate K-means and principal component analyses. Two of them encompassed the parental GST M1-1 and GST M2-2. A third one diverged functionally by displaying enhanced activities with some substrates and suppressed activities with signature substrates for GST M1-1 and GST M2-2. A fourth cluster contained mutants with impaired functions and was not regarded as a quasi-species. Sequence analysis of representatives of the mutant clusters demonstrated that the majority of the variants in the diverging novel quasi-species were structurally similar to the M1-like GSTs, but distinguished themselves from GST M1-1 by a Ser to Thr substitution in the active site. The data show that multivariate analysis of functional profiles can identify small structural changes influencing the evolution of enzymes with novel substrate-activity profiles.
Molecular evolution is frequently portrayed by structural relationships, but delineation of separate functional species is more elusive. We have generated enzyme variants by stochastic recombinations of DNA encoding two homologous detoxication enzymes, human glutathione transferases M1-1 and M2-2, and explored their catalytic versatilities. Sampled mutants were screened for activities with eight alternative substrates, and the activity fingerprints were subjected to principal component analysis. This phenotype characterization clearly identified at least three distributions of substrate selectivity, where one was orthogonal to those of the parent-like distributions. This approach to evolutionary data mining serves to identify emerging molecular quasi-species and indicates potential trajectories available for further protein evolution.
New functional properties and altered structures of proteins arise in natural molecular evolution. The underlying mechanisms are mimicked in vitro by DNA shuffling and other techniques. Experimental and theoretical studies show that evolution does not operate on single individuals, but rather on ensembles of mutants with a stochastic distribution of structural and functional properties. In order to evolve proteins with novel functions it is of great value to be able to recognize these evolving units, called “quasi-species”. Our approach to molecular evolution was to explore the library of variants obtained by DNA shuffling of human GST M1-1 and GST M2–2 enzymes. They share 84% sequence identity at the protein level but show different functional profiles. 384 individuals from the library were screened for catalytic activities with eight substrates undergoing different kinds of chemical transformation, including both substitution and addition reactions. Principal component analysis (PCA) has been applied to the whole data set in order to find functional fingerprints of the sampled individuals. This approach identified at least three different distributions of substrate selectivity profiles, two of which have similar properties to the parents M1–1 and M2–2, respectively, and one that shows a new functional profile. This new “quasi-species” indicates potential trajectories that may be taken advantage of for functional progression, both in natural evolution but also in protein engineering. Supported by the Swedish Research Council, the Swedish Cancer Society and the Carl Trygger Foundation.
The directed evolution of protein function frequently involves identification of mutants with improved properties from a population of variants obtained by mutagenesis. The selection of clones to parent the subsequent generation is crucial to the continued creation of superior progeny. In the present study, multivariate analysis guided the evolution of human glutathione transferase (GST) T1‐1 to 65‐fold enhanced alkyltransferase activity. Six alternative substrates monitored the substrate‐activity space that characterized a mutant library of enzymes, obtained by recombination of DNA and heterologous expression in Escherichia coli. A subset of mutants was identified by their proximity in the targeted region of six‐dimensional factor space. DNA from these mutants was recombined to create a new generation of GST variants from which an improved enzyme was isolated. The multidimensional cluster analysis is applicable to quantitative properties in any population of molecules undergoing evolution and can guide the tailoring of proteins, nucleic acids and other chemical structures to novel and improved functions.