The TM0604 gene of Thermotoga maritima encodes a single-stranded DNA-binding protein (SSB) with a molecular weight of 16,166 Da (residues 1–141) and a calculated isoelectric point of 4.8. SSB plays essential roles in DNA replication, recombination, and repair.1-3 SSB, also known as helix-destabilizing protein, binds tightly to single-stranded DNA (ssDNA) as a homotetramer. SSB from E. coli can bind long ssDNAs in two main binding modes, which occlude 35 or 65 nucleotides per tetramer, depending on how many SSB subunits from the tetramer interact with ssDNA. In the SSB35 binding mode, two SSB subunits per tetramer interact with ssDNA and self-assemble into long clusters. In the SSB65 binding mode, all four SSB subunits interact with ssDNA and do not form clusters.4 Closely related variants of SSB are encoded in the genome of a variety of large, self-transmissible plasmids. The eukaryotic mitochondrial proteins that bind ssDNA are structurally and evolutionarily related to prokaryotic SSB and are probably involved in mitochondrial DNA replication.5 Here, we report the crystal structure of TM0604, which was determined using the semiautomated, high-throughput pipeline of the Joint Center for Structural Genomics (JCSG).6 Protein Production and Crystallization. SSB from Thermotoga maritima (TIGR: TM0604, Swiss-Prot: Q9WZ73) was amplified by PCR from genomic DNA using PfuTurbo (Stratagene) and primer pairs encoding the predicted 5′- and 3′-ends. The PCR product was cloned into plasmid pMH2T7, which encodes an expression and purification tag (MGSDKIHHHHHH) at the amino terminus of the full-length protein. The cloning junctions were confirmed by sequencing. Protein expression was performed in a modified Terrific Broth using the Escherichia coli strain DL41. Lysozyme was added to the culture at the end of fermentation to a final concentration of 250 μg/mL. Bacteria were lysed by sonication after a freeze/thaw procedure in Lysis Buffer [50 mM Tris pH 7.9, 50 mM NaCl, 10 mM imidazole, 0.25 mM Tris(2-carboxyethyl)phosphine hydrochloride (TCEP)], and the cell debris was pelleted by centrifugation at 3400 × g for 60 min. The soluble fraction was applied to a nickel-chelating resin (GE Heathcare) pre-equilibrated with Lysis Buffer. The resin was washed with Wash Buffer [50 mM potassium phosphate pH 7.8, 300 mM NaCl, 40 mM imidazole, 10% (v/v) glycerol, 0.25 mM TCEP], and the target protein was eluted with Elution Buffer [20 mM Tris pH 7.9, 300 mM imidazole, 10% (v/v) glycerol, 0.25 mM TCEP]. The eluate was buffer exchanged with Buffer Q [20 mM Tris pH 7.9, 5% (v/v) glycerol, 0.25 mM TCEP] containing 50 mM NaCl and applied to a RESOURCE Q column (GE Heathcare) pre-equilibrated with the same buffer. The target protein was eluted using a linear gradient of 50 to 500 mM NaCl in Buffer Q. The appropriate RESOURCE Q fractions were pooled and further purified using a Superdex 200 column (GE Healthcare) with elution in Crystallization Buffer [20 mM Tris pH 7.9, 150 mM NaCl, 0.25 mM TCEP]. The appropriate Superdex 200 fractions were pooled and concentrated for crystallization assays to 10 mg/mL by centrifugal ultrafiltration (Millipore). Molecular weight and oligomeric state of the target protein were determined using a 1.0 × 30 cm Superdex 200 column (GE Heathcare) in combination with static light scattering (Wyatt Technology). The mobile phase consisted of 20 mM Tris pH 8.0, 150 mM NaCl. The protein was crystallized using the nanodroplet vapor diffusion method7 with standard JCSG crystallization protocols.6 The crystallization reagent contained 25% MPD, 0.1M Tris pH 8.0. An additional 10% MPD (35% final concentration) was added as a cryoprotectant. The crystals were indexed in the orthorhombic space group F222 (Table I). Native diffraction data were collected to 2.3 Å resolution at the Advanced Light Source (ALS, Berkeley, CA) on beamline 5.0.3. The dataset was collected at 100 K using an ADSC Q4 CCD detector. Data were integrated and reduced using Mosflm8 and then scaled with the program SCALA from the CCP4 suite.9 Data statistics are summarized in Table I. Diffraction was anisotropic, resulting in incomplete data in the higher resolution shells. Intensity falloff is greatest in the b* axis direction. Because the data are incomplete to 2.3 Å, we define the nominal resolution as 2.6 Å, which is the resolution of a dataset that is 100% complete and has the same number of reflections as observed in the current dataset.10 There are 739 observed reflections between 2.6 and 2.3 Å (39% complete for this shell). The structure was determined with the JCSG molecular replacement pipeline11 using E. coli SSB protein (PDB: 1sru, sequence identity 34%) as the search model. Refinement was carried out using RESOLVE,12 REFMAC5,13 and XFIT.14 Refinement statistics are summarized in Table I. Analysis of the stereochemical quality of the model was accomplished using the AutoDepInputTool,15 MolProbity,16 SFcheck 4.0,9 and WHATIF 5.0.17 Figure 1(B) was adapted from an analysis using PDBsum,18 and all others were prepared with PyMOL (DeLano Scientific). Atomic coordinates and experimental structure factors of TM0604 have been deposited within the PDB and are accessible under the code 1z9f. Crystal structure of SSB from Thermotoga maritima. (A) Stereo ribbon diagram of TM0604 monomer color-coded from N-terminus (blue) to C-terminus (red). The N-terminus corresponding to the actual start of SSB is labeled. The α-helix H1 and β-strands (β1–β6) are labeled. Disordered regions are indicated by dashed lines. (B) Diagram showing the secondary structural elements of TM0604 superimposed on its primary sequence. The α-helix, β-strands, β-turns, and disordered regions, with corresponding sequence in brackets, are indicated. The three-dimensional structure of TM0604 [Fig. 1(A)] was determined to a nominal resolution of 2.60 Å by the molecular replacement (MR) method using the coordinates of the E. coli SSB protein (PDB: 1sru, sequence identity of 34%)19 as the search model. Data collection, model, and refinement statistics are summarized in Table I. The final model contains only 88 of the 141 residues and includes one monomer (residues 1–23, 26–37, 49–85, and 93–108), one residue from the His-tag, and 14 water molecules. No electron density was observed for residues 24 and 25, 38 to 48, 86 to 92, 109 to 141, and the remaining residues of the expression and purification tag. The Matthews' coefficient (Vm)20 is 1.94 Å3/Da, and the estimated solvent content is 36.2%. The Ramachandran plot, produced by MolProbity,16 shows that 93.8%, 97.5%, and 2.5% of the residues are in favored, allowed, and disallowed regions, respectively. The outlier residues are Met1 (ϕ = 172.4° and ψ = −57.8°) and Ser2 (ϕ = 104.7° and ψ = 19.9°), which are located at the N-terminus, but have good electron density. The TM0604 monomer contains six β-strands (β1–β6) and one α-helix (H1) [Fig. 1(A,B)], with about 37% of the structure being disordered. The total β-strand and α-helical content is 34.1% and 8.4%, respectively. The Structural Classification of Proteins database (SCOP) classifies this protein as an oligonucleotide-binding fold (OB-fold).21 The OB-fold comprises a closed, five-stranded β-barrel architecture (β1, β3–β6) that packs, in the case of TM0604, against one α-helix (H1) and an additional β-strand (β2) [Fig. 1(A)]. Analytical size exclusion chromatography in combination with static light scattering indicates the oligomeric state to be a tetramer. A tetramer is also consistent with the analysis of crystallographic packing using the PQS server,22 as well as what has been reported for SSB from E. coli (PDB: 1eyg) [Fig. 2(A,B)]. (A) Stereo ribbon diagram of a superposition of the TM0604 tetramer (blue) and an SSB-ssDNA complex from E. coli (gray). (B) Stereo view of a model of a TM0604-ssDNA complex in surface representation with ssDNA depicted as sticks. The model is based on the SSB-ssDNA complex from E. coli. The surface is colored according to electrostatic potential in a range where red is negative (−84 kT/e) and blue is positive (+ 84 kT/e). A structural similarity search, performed with the coordinates of TM0604 using the DALI23 server, reveals the structure of SSB from Mycobacterium tuberculosis (PDB: 1ue1),24 with an RMSD of 1.3 Å over 86 aligned Cα atoms and a sequence identity of 30%. A superposition of TM0604 with SSB from E. coli (PDB: 1sru)19 reveals an RMSD of 1.5 Å over 79 aligned Cα atoms, whereas an alignment with an SSB-ssDNA complex from E. coli (PDB: 1eyg)4 gives an RMSD of 1.6 Å over 79 aligned Cα atoms, with a sequence identity of 34% in both cases. A structural comparison between tetramers of TM0604 and the SSB-ssDNA complex from E. coli reveals a similar tetrameric architecture and a dramatic decrease of disorder in the loops between 23 to 26, 37 to 49, and 85 to 93 upon binding to ssDNA25 [Fig. 2(A)]. From a number of residues implicated in critical interactions with ssDNA in E. coli SSB (Arg3, Gly15, Trp40, Trp54, Phe60, Lys62 Tyr70, Lys73, Trp88, Met109, and Met111), only Phe60, Tyr70, Lys73, Trp88, and Met109 are strictly conserved in TM0604. Nevertheless, a superposition of TM0604 with the SSB-ssDNA complex shows good agreement of surface topography and electrostatic potential and suggests a very similar ssDNA binding mode for TM0604 [Fig. 2(B)]. Currently, the SSB protein family contains more than 400 sequence homologues mainly of bacterial, eukaryotic, and viral origin. Models for TM0604 homologues can be accessed at http://www1.jcsg.org/cgi-bin/models/get_mor.pl?key=TM0604. The TM0604 structure reported here represents an SSB protein, whose structure has been determined by X-ray crystallography. The information reported here, in combination with further biochemical and biophysical studies, will yield valuable insights regarding the role of this protein in DNA replication, recombination, and repair. This work was supported by NIH Protein Structure Initiative grants P50-GM 62411 and U54 GM074898 from the National Institute of General Medical Sciences (http://www.nigms.nih.gov). Portions of this research were carried out at the Stanford Synchrotron Radiation Laboratory (SSRL) and the Advanced Light Source (ALS). The SSRL is a national user facility operated by Stanford University on behalf of the U. S. Department of Energy, Office of Basic Energy Sciences. The SSRL Structural Molecular Biology Program is supported by the Department of Energy, Office of Biological and Environmental Research, and by the National Institutes of Health (National Center for Research Resources, Biomedical Technology Program, and the National Institute of General Medical Sciences). The ALS is supported by the Director, Office of Science, Office of Basic Energy Sciences, Materials Sciences Division, of the U. S. Department of Energy under Contract No. DE-AC03-76SF00098 at Lawrence Berkeley National Laboratory.
The crystal structure of the Bacteroides thetaiotaomicron protein BT_3984 was determined to a resolution of 1.7 Å and was the first structure to be determined from the extensive SusD family of polysaccharide-binding proteins. SusD is an essential component of the sus operon that defines the paradigm for glycan utilization in dominant members of the human gut microbiota. Structural analysis of BT_3984 revealed an N-terminal region containing several tetratricopeptide repeats (TPRs), while the signature C-terminal region is less structured and contains extensive loop regions. Sequence and structure analysis of BT_3984 suggests the presence of binding interfaces for other proteins from the polysaccharide-utilization complex.
The TM1112 gene of Thermotoga maritima encodes a conserved hypothetical protein with a molecular weight of 10,626 Da (residues 1–89) and a calculated isoelectric point of 5.5. Currently, no functional annotation has been made for this protein, but fold recognition methods, such as the Fold and Function Assignment System (FFAS),1 recognized significant sequence similarity to the family of cupins.2 Here, we report the crystal structure of TM1112 that was determined using the semiautomated high-throughput pipeline of the Joint Center for Structural Genomics (JCSG).3 The structure of TM1112 [Fig. 1(A)] was determined to 1.83 Å resolution by the molecular replacement (MR) method using the TM1112 NMR structure as the search model (PDB: 1lkn). Data collection, model, and refinement statistics are summarized in Table 1. The final model includes two protein monomers (residues 2–89), two unknown ligands (UNL), and 332 water molecules. The Matthews' coefficient (Vm) for TM1112 is 2.48 Å3/Da and the estimated solvent content is 50.0%. The Ramachandran plot, produced by Procheck 3.4,4 shows that 96.8% of the residues are in the most favored regions and 3.2% in additional allowed regions. Crystal structure of TM1112. A: Ribbon diagram of Thermotoga maritima TM1112 color coded from N-terminus (blue) to C-terminus (red) showing the domain organization viewed along (left) and normal (right) to the barrel axis. Helices (H1, H2), β-sheets (A and A′), and β-strands (β1–β7) are indicated. B: Diagram showing the secondary structure elements in TM1112 superimposed on its primary sequence. The β-sheets are indicated by a red A or A′ and the β-hairpin is depicted as red loops. Residues adjacent to the the unknown ligand (UNL) molecule are marked with red dots (also see Fig. 2). A: The proposed active site of TM1112 is depicted with the unknown ligand molecule (UNL) bound to Lys84 and its coordinating residues (Trp24, Trp33, Glu39, Cys41, Tyr35, and Trp76) in ball and stick. B: Close up view of the active site with a 2Fo–Fc map around Lys84, the covalently-bound UNL and Cys41 contoured at 1σ (marine blue). The atoms are indicated as follows: carbon (grey), oxygen (red), nitrogen (blue), sulfur (yellow), and UNL (pink). Potential covalent bonds for the UNL ligand are represented as dashed pink lines, but until ligand identification, these are quite speculative. The TM1112 monomer is composed of seven β-strands (β1–β7), one α-helix (H1), and one short 310-helix (H2). The total β-strand content is 59.1%. The TM1112 structure is characterized by an antiparallel β-sheet that forms a jelly roll β-sandwich with a topology that is reminiscent of the cupin barrel fold2 [Fig. 1(A)]. The seven-stranded β-sheet (β1–β7) can be viewed as composed of two connected β-sheets, A with 16472 topology and A′ with 3745 topology, fused together via two strongly bent β-strands β4 and β7 [Fig. 1(A)]. Because of this variation, the Structural Classification of Proteins database (SCOP)5 classified TM1112 as a new subfamily of RmlC-like cupins. The root-mean-square deviation (RMSD) between the crystal structure and the averaged NMR structure (PDB: 1lkn) of TM1112 is 1.3 Å over 88 aligned residues. Both structures indicate that a monomer is the biologically-relevant form of TM1112. An alignment of the TM1112 sequence with homologous-cupin-like sequences, derived from a FFAS1 search, identifies a cluster of strictly conserved residues (Trp24, Trp33, Glu39, Cys41, Tyr35, Trp76, and Lys84) located in the center of the β-barrel [Fig. 2(A)]. The conservation of these side-chains within a groove in the center of the β-barrel indicates a proposed location for the TM1112 active site. SigmaA-weighted OMIT maps show additional compact density contiguous with the side-chain amino group of Lys84 [Fig. 2(B)]. Connecting density also suggests a hydrogen bond (distance 2.75 Å) between Lys84 and the adjacent sulfhydryl group of Cys41. Despite extensive model building and database searching, the density could not be unambiguously interpreted and was, therefore, modeled as an unknown ligand (UNL) consisting of five atoms covalently bound to Lys84 which suggests a catalytic role for Lys84 and Cys41. The apparent covalent nature of this adduct suggests either a post-translational modification or interaction with an unknown substrate. However, we were unable to identify a similar active site configuration in the PDB which indicates that TM1112 represents a functionally novel enzyme from the cupin family. Clearly, further work is needed to define the enzymatic activity and mechanism of these cupins. A structural similarity search, performed with the coordinates of TM1112 using the DALI server,6 indicated that the closest structural homologue is quercetin 2,3-dioxygenase, an RmlC-like cupin from Aspergillus japonicus (PDB: 1 juh),7 with an RMSD of 2.4 Å over 82 aligned residues with 12% sequence identity. Another structural homologue is the N-terminal domain of the Catabolite Gene Activator Protein (CAP) from Escherichia coli (PDB: 2cgp),8 where the RMSD is 2.7 Å over 79 aligned residues with 11% sequence identity. According to FFAS,1 TM1112 has four distant homologues in the Thermotoga maritima proteome: TM1010 with 10% sequence identity, TM1287 (14%), TM1459 (10%), and TM0656 (14%). Sequence similarity searches with the TM1112 sequence against the non-redundant protein sequence database (NCBI) revealed more than one hundred homologues in prokaryotes and eukaryotes, all of which are designated as conserved hypothetical proteins. This new cupin sub-family comprises single-domain proteins like TM1112, as well as multi-domain proteins. Models for TM1112 homologues can be accessed at http://www1.jcsg.org/cgi-bin/models/get_mor.pl?key=TM1112. The crystal structure reported here represents a novel enzyme from the cupin family that was determined by MR using the TM1112 NMR structure as a template. The information reported here, in combination with further biochemical and biophysical studies, will yield valuable insights into the functional determinants of this protein and the thermostability of these organisms. TM1112 (TIGR: TM1112; Swissprot: Q9X0J6) was amplified by polymerase chain reaction (PCR) from Thermotoga maritima strain MSB8 genomic DNA using PfuTurbo (Stratagene) and primer pairs encoding the predicted 5′- and 3′-ends of TM1112. The PCR product was cloned into plasmid pMH1, which encodes an expression and purification tag (MGSDKIHHHHHH) at the amino terminus of the full-length protein. The cloning junctions were confirmed by sequencing. Protein expression was performed in a modified Terrific Broth [24 g/liter yeast extract, 12 g/liter tryptone, 1% (v/v) glycerol, 50 mM 3-(N-Morpholino)propanesulfonic acid (MOPS), pH 7.6] using the E. coli strain GeneHogs® (Invitrogen). Lysozyme was added to the culture at the end of fermentation to a final concentration of 1 mg/ml. Bacteria were lysed by sonication after a freeze-thaw procedure in Lysis Buffer [50 mM Tris, pH 7.9, 50 mM NaCl, 1 mM MgCl2, 5 mM 2-Mercaptoethanol, 3 mM DL-methionine, 2.5 U/ml Benzonase® (Sigma)], and cell debris pelleted by centrifugation at 3400 × g for 60 min. The soluble fraction was applied to a nickel-resin (Amersham Biosciences) pre-equilibrated with Equilibration Buffer (50 mM potassium phosphate, pH 7.8, 0.25 mM Tris(2-carboxyethyl)phosphine hydrochloride (TCEP), 10% (v/v) glycerol, 400 mM NaCl, 100 mM KCl, 20 mM imidazole, 3 mM DL-methionine). The nickel-resin was washed with Equilibration Buffer, and the protein eluted with Elution Buffer (20 mM Tris, pH 7.9, 10% (v/v) glycerol, 0.25 mM TCEP, 200 mM imidazole, 3 mM DL-methionine). The eluate was buffer exchanged into Crystallization Buffer (20 mM Tris, pH 7.9, 150 mM NaCl, 0.25 mM TCEP) and concentrated for crystallization assays to 19 mg/ml by centrifugal ultrafiltration (Millipore). The protein was crystallized using the nanodroplet vapor diffusion method9 using standard JCSG crystallization protocols.3 Crystals grew in Hampton Crystal Screen Cryo #31 [25.5% polyethylene glycol (PEG) 4000, 15% glycerol, and 0.17 M (NH4)2SO4]. The crystals were indexed in the monoclinic space group P21 (Table I). Native diffraction data were collected on beamline 9-1 at the Stanford Synchrotron Radiation Laboratory (SSRL, Stanford, USA) using the BLU-ICE10 data collection environment (Table I). The dataset was collected at 100K using a Quantum 315 CCD detector. Data were integrated and reduced using Mosflm11 and then scaled with the program SCALA from the CCP4 suite.12 Data statistics are summarized in Table I. The structure was determined by molecular replacement using the program MOLREP from the CCP4 suite.12 The ten models from the NMR structures of TM1112 (PDB: 1lkn), solved by the Northeast Structural Genomics Consortium,13 were used as search models. The correct solution could only be obtained with model number 9 and gave an Rfree = 0.48 and Rcryst = 0.46 after initial rigid body and restrained refinement in REFMACS.12 Structure refinement was performed using TLS refinement in REFMAC5,12 O,14 and Xfit.15 Refinement statistics are summarized in Table I. The final model includes two protein monomers (residues 2–89), two unknown ligands (UNL), and 332 water molecules in the asymmetric unit. No electron density was observed for the expression or purification tag. Analysis of the stereochemical quality of the models was accomplished using Procheck 3.4,4 SFcheck 4.0,12 and WHAT IF 5.0.16 Figure 1(B) was adapted from an analysis using PDBsum (http://www.biochem.ucl.ac.uk/bsm/pdbsum/) and all others were prepared with PYMOL (DeLano Scientific). Atomic coordinates of the final model and experimental structure factors of TM1112 have been deposited with the PDB and are accessible under the code 1o5u. This work was supported by NIH Protein Structure Initiative grant P50-GM 62411 from the National Institute of General Medical Sciences (www.nigms.nih.gov). Portions of this research were carried out at the Stanford Synchrotron Radiation Laboratory, a national user facility operated by Stanford University on behalf of the U.S. Department of Energy, Office of Basic Energy Sciences. The SSRL Structural Molecular Biology Program is supported by the Department of Energy, Office of Biological and Environmental Research, and by the National Institutes of Health (National Center for Research Resources, Biomedical Technology Program, and the National Institute of General Medical Sciences).
After more than two decades of research and development, adeno-associated virus (AAV) has become one of the dominant delivery vectors in gene therapy. Despite the focused research, the cell entry pathway for AAV is still not fully understood. Universal AAV receptor (AAVR) has been identified to be involved in cellular entry of different AAV serotypes. With the unveiling of the high-resolution AAV-AAVR complex structure by cryogenic electron microscopy, the atomic level interaction between AAV and AAVR has become the focus of study in recent years. However, the serotype dependence of this binding interaction and the effect of pH have not been studied. Here, orthogonal approaches including bio-layer interferometry (BLI), size-exclusion chromatography coupled to multi-angle laser scattering (SEC-MALS) and sedimentation velocity analytical ultracentrifugation (SV-AUC) were utilized to study the interaction between selected AAV serotypes and AAVR under different pH conditions. A robust BLI method was developed and the equilibrium dissociation binding constants (KD) between different AAV serotypes (AAV1, AAV5 and AAV8) and AAVR was measured. The binding constants measured by BLI together with orthogonal methods (SEC-MALS and AUC) all confirmed that AAV5 has the strongest binding affinity followed by AAV1 while AAV8 binds the weakest. It was also observed that lower pH promotes the binding between AAV and AAVR and neutral or slightly basic conditions lead to very weak binding. These data indicate that for certain serotypes, AAVR may play a prominent role in trafficking AAV to the Golgi rather than acting as a host cell receptor. Information obtained from these combinatorial biophysical methods can be used to engineer future generations of AAVs to have better transduction efficiency.
The TM0574 gene of Thermotoga maritima encodes an S-adenosylmethionine:tRNA ribosyltransferase-isomerase (QueA), also known as queuosine (Q) biosynthesis protein, with a molecular weight of 38,529 Da (residues 1–335) and a calculated isoelectric point of 8.61. This enzyme catalyzes the formation of the 2,3-epoxy-4,5-dihydroxycyclopentane ring of the Q precursor epoxyqueuosine (oQ) [Fig. 1(A)]. S-adenosyl-L-methionine (AdoMet) reacts with 7-aminomethyl-7-deazaguanine of tRNA at position 34 to yield adenine, methionine, and a modified tRNA with oQ at position 34.1 The epoxy-cyclopentenediol moiety of oQ originates from the ribosyl portion of AdoMet, which is the only known example of the stoichiometric use of AdoMet as a ribosyl donor in an enzymatic reaction.2 Queuosine {7-[((4,5-cis-dihydroxy-2-cyclopentene-1-yl)-amino)-methyl]-7-deazaguanosine} is a hypermodified nucleoside that occurs at position 34, the anticodon wobble position, of aspartate, asparagine, tyrosine, and histidine tRNAs.3 Although Q is found in bacteria and eukaryotes, only bacteria are capable of its de novo synthesis. Eukaryotes obtain Q as a dietary nutrient or from the intestinal flora, and incorporate it using the enzyme queuine-tRNA ribosyltransferase.2 Queuine functions as a growth modulator for higher eukaryotes that significantly reduces overall protein synthesis in mammalian cells, and modulates the phosphorylation of specific proteins.4 Queuine deficiency has been observed in tRNAs from ovarian tumors,5 whereas increased levels of Q have been observed in human leukemic cells.6 Here, we report the crystal structure of TM0574, that was determined using the semiautomated, high-throughput pipeline of the Joint Center for Structural Genomics (JCSG).7 Reaction catalyzed and crystal structure of QueA. (A) Scheme of the QueA reaction (see text). (B) Stereo ribbon diagram of TM0574 color-coded from N-terminus (blue) to C-terminus (red) showing the domain organization. Helices H1–H10 and β-strands β1–β15 are indicated. Disordered regions are depicted by a dashed line. (C) Diagram of chain A showing the secondary structure elements in TM0574 superimposed on its primary sequence. The α-helices, 310-helices, β-bulges, and γ-turns are indicated. The β-sheet strands are indicated by a red A and B. β-hairpins are depicted as red loops. Disordered regions are depicted by a dashed line, with the corresponding sequence in brackets. The structure of TM0574 [Fig. 1(A)] was determined to 2.00 Å resolution using the multiwavelength anomalous dispersion (MAD) method. Data collection, model, and refinement statistics are summarized in Table I. The final model includes 2 protein monomers (residues 4–72, 76–142, 174–201, and 220–335 in chain A and 4–72, 75–142, 174–203, and 220–335 in chain B), 2 unknown ligand (UNL) molecules, and 174 water molecules. The Matthew's coefficient (Vm)10 for TM0574 is 3.66 Å3/Da, and the estimated solvent content is 66.1%. The Ramachandran plot, produced by MolProbity,11 shows that 98.2%, 1.2%, and 0.6% of the residues are in favored, allowed, and disallowed regions, respectively. The disallowed residues, Ser77 (ϕ = 56.3°, ψ = 106.6°) and Glu221 (ϕ = 83.3°, ψ = 104.2°) in chain A, and Glu221 (ϕ = 71.8°, ψ = 110.6°) in chain B, are adjacent to disordered regions of the structure [Fig. 1(B and C)]. The TM0574 monomer contains 15 β-strands (β1–β15), 6 α-helices (H4, H6–H10), and four 310-helices (H1–H3, H5) [Fig. 1(A and B)]. The total β-strand, α-helical, and 310-helical content is 32.5%, 22.5%, and 6.1%, respectively. TM0574 folds into 2 domains: a large α/β/α domain (residues 1–64 and 143–335) and a small, inserted β-barrel domain (residues 65–142) [Fig. 1(A)]. The large domain features a central, 9-stranded β-sheet surrounded by 9 helices. The twisted β-sheet (β1–β3 and β10–β15) is of the mixed type and has 219863457 topology. The small domain folds into a closed, 6-stranded, greek-key β-barrel with 123654 topology. The openings of the barrel are closed by the linker between β6 and β7 and helix H4. The crystallographic packing of the TM0574 structure, as well as analytical size exclusion chromatography, indicate that a monomer is the biologically-relevant form. A structural similarity search, performed with the coordinates of TM0574 using the DALI server,12 showed no matches for the large α/β/α domain, indicating that it is a new fold and the first structure of a QueA protein family member (PF02547). The small β-barrel domain (residues 65–142) matches the N-terminal domain of F1 ATPase (residues 1–82) from the thermophilic bacterium Bacillus PS3[Protein Data Bank (PDB) code: 1sky].13 The root-mean-square deviation (RMSD) for this structural alignment is 2.5 Å over 60 aligned residues with 13% sequence identity [Fig. 2(A)]. (A) Superposition of the small domain of TM0574 (65–142; gray) and the N-terminal domain (residues 1–82) of F1 ATPase from the thermophilic bacterium Bacillus PS3 (PDB code: 1sky, purple). (B) Surface representation of TM0574. Residues are colored according to sequence conservation, where green is conserved and white is nonconserved. The UNL is shown as pink spheres. (C) Close-up view of the putative TM0574 active site shown in ribbon representation. The UNL (purple), interacting residues (see text), and 3 potential waters near the UNL moiety are depicted in ball-and-stick representation. A Fo-Fc omit map contoured at 2.0 σ is shown for the UNL and coordinating waters. Currently, the QueA family contains 118 sequence homologs exclusively found in bacteria. Models for TM0574 homologs can be accessed at http://www1.jcsg.org/cgi-bin/models/get_mor.pl?key=TM0574. A surface map of conserved residues from an alignment of available QueA sequences reveals an extended cluster of conserved residues in the large domain, which indicates a possible location for the tRNA binding site [Fig. 2(B)]. In addition, a SigmaA-weighted omit map shows additional compact density on this conserved face of each monomer within a shallow pocket on the edge of the β-sheet in the large domain [Fig. 2(C)], proximal to the side-chains of Thr174, Thr249, Thr250, Thr289, Asn290, and His292. The electron density of the putative ligand suggests a moiety that contains a 6-membered ring with additional atoms, reminiscent of a nucleotide moiety. Despite extensive modeling efforts, the density could not be completely identified and was, therefore, modeled as a UNL. The proposed active site is formed by 2 "strand–loop–helix" segments and a loop region (Phe325–Asp330) connecting α-helix H10 to the C-terminal β-strand β15. The strand–loop–helix motifs (Ile244–Ile258 and Leu287–Phe305) contain several conserved residues and are close to the bound ligand. Further analysis of the putative active site regions indicates that Gly248, Thr249, Thr250, Arg253, Thr289, Asn290, His292, Leu298, Phe326 and Ser327 are potential ligand-binding residues. All of these residues are well conserved among QueA sequences. Furthermore, one of the conserved regions (residues 143–173), which links the active site to the small β-domain, is disordered in the structure. The presence of several conserved residues within this linker suggests a key role in stabilizing the enzyme:tRNA complex. Using extensive DALI searches12 and manual structure comparison, we were unable to identify a similar active site configuration in known tRNA-binding proteins, which indicates that QueA contains a functionally novel active site. The TM0574 structure reported here contains a new fold and represents the first QueA protein whose structure has been determined by X-ray crystallography. The information reported here, in combination with further biochemical and biophysical studies, will yield valuable insights into the functional role of queuosine biosynthesis protein (QueA). TM0574 (TIGR: TM0574; Swiss-Prot: Q9WZ44) was amplified by polymerase chain reaction (PCR) from genomic DNA using PfuTurbo (Stratagene) and primer pairs encoding the predicted 5′- and 3′-ends. The PCR product was cloned into plasmid pMH1, which encodes an expression and purification tag (MGSDKIHHHHHH) at the amino terminus of the full-length protein. The cloning junctions were confirmed by sequencing. Protein expression was performed in a selenomethionine-containing medium using the Escherichia coli methionine auxotrophic strain DL41. Lysozyme was added to the culture at the end of fermentation to a final concentration of 250 μg/mL. Bacteria were lysed by sonication after a freeze-thaw procedure in Lysis Buffer [50 mM Tris, pH 7.9, 50 mM NaCl, 10 mM imidazole, 0.25 mM Tris(2-carboxyethyl)phosphine hydrochloride (TCEP)], and the cell debris was pelleted by centrifugation at 3400 × g for 60 min. The soluble fraction was applied to a nickel-resin (Amersham Biosciences) pre-equilibrated with Lysis Buffer. The nickel-resin was washed with Wash Buffer [50 mM potassium phosphate, pH 7.8, 40 mM imidazole, 300 mM NaCl, 10% (v/v) glycerol, 0.25 mM TCEP], and the protein was eluted with Elution Buffer [20 mM Tris pH 7.9, 300 mM imidazole, 10% (v/v) glycerol, 0.25 mM TCEP]. Buffer exchange was performed to remove imidazole from the eluate, and the protein in Buffer Q [20 mM Tris, pH 7.9, 5% (v/v) glycerol, 0.25 mM TCEP] containing 50 mM NaCl was applied to a RESOURCE Q column (Amersham Biosciences) pre-equilibrated with the same buffer. The RESOURCE Q flow-through fraction was pooled, further purified using a Superdex 200 size exclusion column (SEC; Amersham Biosciences) with elution in Crystallization Buffer [20 mM Tris, pH 7.9, 150 mM NaCl, 0.25 mM TCEP], and concentrated for crystallization trials to 14 mg/mL by centrifugal ultrafiltration (Millipore). Native protein was prepared using the same protocol with the following exceptions: After elution from the SEC column in Crystallization Buffer, the protein was buffer-exchanged into 20 mM Tris, pH 7.9, 50 mM NaCl, 0.25 mM TCEP and concentrated for crystallization trials to 12 mg/mL. Analytical SEC was performed as follows. Molecular weight and oligomeric state of TM0574 were estimated using a 1.0 × 30 cm Superdex 200 column (Amersham Biosciences) pre-equilibrated in 20 mM Tris, pH 7.9, 150 mM NaCl, 1 mM TCEP and calibrated with gel filtration standard (Bio-Rad). The protein was crystallized initially using the nanodroplet vapor diffusion method,14 using standard JCSG crystallization protocols,7 and further optimized by hanging drop vapor diffusion. The crystallization solution contained 12% polyethylene glycol (PEG-4000), 0.1 M citric acid at pH 5.0; 35% glycerol (final concentration) was included as a cryoprotectant. The crystals were indexed in the orthorhombic space group I222 (Table I). MAD data were collected to 2.9 Å resolution at the Advanced Photon Source (APS, Chicago, IL) on beamline SBC-19-ID at wavelengths corresponding to the inflection point (λ1), low energy remote (λ2), and the peak (λ3) of a selenium MAD experiment. In addition, a 2.0 Å data set (λ0) was collected on beamline 8.2.2 (ALS, Berkeley, CA). The data sets were collected at 100 K using APS SBC2 or Quantum 210 charge-coupled device (CCD) detectors. MAD data were integrated, reduced, and scaled using HKL2000.15 Native data were integrated and reduced using Mosflm16 and then scaled with the program SCALA from the CCP4 suite.8 Data statistics are summarized in Table I. The initial structure was determined with the 2.9 Å selenium MAD data (λ1,2,3) using the CCP4 suite8 and SOLVE/RESOLVE.17 Model building and refinement were performed with the 2.0 Å data set (λ0) using O18 and REFMAC5.8 Refinement statistics are summarized in Table I. The final model includes 2 protein monomers, an unidentified ligand in each of the active sites, and 174 water molecules in the asymmetric unit. No electron density was observed for residues 1–3, 73–75, 143–173, and 202–219 in chain A, 1–3, 73–74, 143–173, and 204–219 in chain B, and the expression and purification tags in both chains. Analysis of the stereochemical quality of the model was accomplished using the AutoDepInputTool (http://deposit.pdb.org/adit/), MolProbity,11 SFcheck 4.0,8 and WHAT IF 5.0.19 Protein quaternary structure analysis was performed using the PQS server (http://pqs.ebi.ac.uk/). Figure 1(B) was adapted from PDBsum (http://www.biochem.ucl.ac.uk/bsm/pdbsum/), and all others were prepared with PYMOL (DeLano Scientific). Atomic coordinates and experimental structure factors of TM0574 have been deposited within the PDB and are accessible under the code 1vky. Portions of this research were carried out at the Stanford Synchrotron Radiation Laboratory (SSRL) the Advanced Light Source (ALS), and the Advanced Photon Source (APS). The SSRL is a national user facility operated by Stanford University on behalf of the U.S. Department of Energy, Office of Basic Energy Sciences. The SSRL Structural Molecular Biology Program is supported by the Department of Energy, Office of Biological and Environmental Research, and by the National Institutes of Health (National Center for Research Resources, Biomedical Technology Program, and the National Institute of General Medical Sciences). The ALS is supported by the Director, Office of Science, Office of Basic Energy Sciences, Materials Sciences Division, of the U.S. Department of Energy under Contract No. DE-AC03-76SF00098 at Lawrence Berkeley National Laboratory. Data collection was conducted at the Northeastern Collaborative Access Team beamlines of the Advanced Photon Source, supported by award RR-15301 from the National Center for Research Resources at the National Institute of Health. Use of the Advanced Photon Source is supported by the U.S. Department of Energy, Office of Basic Energy Sciences, under contract No. W-31-109-ENG-38.
Examination of the genomic context for members of the FmdE Pfam family (PF02663), such as the protein encoded by the fmdE gene from the methanogenic archaeon Methanobacterium thermoautotrophicum, indicates that 13 of them are co-transcribed with genes encoding subunits of molybdenum formylmethanofuran dehydrogenase (EC 1.2.99.5), an enzyme that is involved in microbial methane production. Here, the first crystal structures from PF02663 are described, representing two bacterial and one archaeal species: B8FYU2_DESHY from the anaerobic dehalogenating bacterium Desulfitobacterium hafniense DCB-2, Q2LQ23_SYNAS from the syntrophic bacterium Syntrophus aciditrophicus SB and Q9HJ63_THEAC from the thermoacidophilic archaeon Thermoplasma acidophilum. Two of these proteins, Q9HJ63_THEAC and Q2LQ23_SYNAS, contain two domains: an N-terminal thioredoxin-like α+β core domain (NTD) consisting of a five-stranded, mixed β-sheet flanked by several α-helices and a C-terminal zinc-finger domain (CTD). B8FYU2_DESHY, on the other hand, is composed solely of the NTD. The CTD of Q9HJ63_THEAC and Q2LQ23_SYNAS is best characterized as a treble-clef zinc finger. Two significant structural differences between Q9HJ63_THEAC and Q2LQ23_SYNAS involve their metal binding. First, zinc is bound to the putative active site on the NTD of Q9HJ63_THEAC, but is absent from the NTD of Q2LQ23_SYNAS. Second, whereas the structure of the CTD of Q2LQ23_SYNAS shows four Cys side chains within coordination distance of the Zn atom, the structure of Q9HJ63_THEAC is atypical for a treble-cleft zinc finger in that three Cys side chains and an Asp side chain are within coordination distance of the zinc.
Transcriptional regulators play a crucial role in the adaptation of microorganisms to diverse environmental challenges.1-3 Most microbial transcriptional regulators contain an effector binding regulatory domain and a DNA-binding domain that interacts with a specific operator DNA to either prevent (transcriptional repressors) or stimulate (transcriptional activators) transcription of a nearby gene(s).4 Prokaryotic transcriptional regulators have been classified into a number of families based on amino acid sequence similarity and domain architecture.4-8 The tetracycline repressor (TetR) family of proteins exhibits a high degree of sequence similarity at the N-terminal DNA-binding domain (∼50 amino acids), which adopts a helix-turn-helix (HTH) motif. In contrast, the regulatory domain is more variable, possibly reflecting the need to specifically accommodate different effectors.4, 9 TM1030 from Thermotoga maritima, a hyperthermophilic bacterium that typically thrives in high temperature ecosystems, is a 200 amino acid protein with a molecular weight of 24 kDa and an isoelectric point of 6.25. The N-terminal DNA-binding domain of TM1030 shows sequence similarity to members of the TetR family, but no significant similarity is found for the regulatory C-terminal region (∼150 amino acids). Here, we present the crystal structure of a ligand-bound form of TM1030, which was determined to 2.3 Å resolution, using the semiautomated, high-throughput pipeline of the Joint Center for Structural Genomics (JCSG)10 as part of the National Institute of General Medical Sciences (NIGMS)-funded Protein Structure Initiative (PSI). The TM1030 gene (GenBank: AAD36107.1, GI: 4981571, Swiss-Prot: Q9×0C0) from Thermotoga maritima was amplified by polymerase chain reaction (PCR) from genomic DNA using PfuTurbo (Stratagene) and primers corresponding to the predicted 5′- and 3′-ends. The PCR product was cloned into plasmid pMH1, which encodes an expression and purification tag (MGSDKIHHHHHH) at the amino terminus of the full-length protein. The TM1030 gene uses an alternate start codon (GUG) that results in a valine at position 1 when expressed as a fusion with the expression and purification tag. The cloning junctions were confirmed by DNA sequencing. Protein expression was performed in a selenomethionine-containing medium using the Escherichia coli methionine auxotrophic strain DL41. At the end of fermentation, lysozyme was added to the culture to a final concentration of 250 μg/mL, and the cells were harvested. After one freeze/thaw cycle, the cells were sonicated in lysis buffer [50 mM Tris pH 8.0, 50 mM NaCl, 10 mM imidazole, 0.25 mM Tris(2-carboxyethyl)phosphine hydrochloride (TCEP)], and the lysate was clarified by centrifugation at 32,500g for 30 min. The soluble fraction was passed over nickel-chelating resin (GE Healthcare) pre-equilibrated with Lysis Buffer, the resin was washed with Wash Buffer [50 mM potassium phosphate, pH 7.8, 300 mM NaCl, 40 mM imidazole, 10% (v/v) glycerol, 0.25 mM TCEP], and the protein was eluted with elution buffer [20 mM Tris pH 8.0, 300 mM imidazole, 10% (v/v) glycerol, 0.25 mM TCEP]. The eluate was diluted ten-fold with Buffer Q [20 mM Tris pH 7.9, 50 mM NaCl, 5% (v/v) glycerol, 0.25 mM TCEP] and applied to a RESOURCE Q column (GE Healthcare) pre-equilibrated with the same buffer. The flow-through fraction, which contained TM1030, was further purified on a Superdex 200 column (GE Healthcare), with isocratic elution in Crystallization Buffer [20 mM Tris pH 7.9, 150 mM NaCl, 0.25 mM TCEP]. The protein was concentrated for crystallization assays to 15 mg/mL by centrifugal ultrafiltration (Millipore) and crystallized using the nanodroplet vapor diffusion method11 with standard JCSG crystallization protocols.10 The crystallization reagent contained 30% (w/v) polyethylene glycol (PEG) 8000, 0.2M Mg(NO3)2, and 0.1M citrate pH 4.5. Ethylene glycol was added as a cryoprotectant to a final concentration of 5% (v/v). Initial screening for diffraction was carried out using the Stanford Automated Mounting system (SAM)12 at the Stanford Synchrotron Radiation Laboratory (SSRL, Stanford, CA). The crystals were indexed in monoclinic space group P21 (Table I). The molecular weight and oligomeric state of TM1030 were determined using a 1 cm × 30 cm Superdex 200 column (GE Healthcare) in combination with static light scattering (Wyatt Technology). The mobile phase consisted of 20 mM Tris pH 8.0, 150 mM NaCl, and 0.02% (w/v) sodium azide. Multi-wavelength anomalous diffraction (MAD) data sets were collected at 100 K using a charge-coupled device detector (ADSC Q315) on SSRL beamline 11-1 using the BLU-ICE13 data collection environment (Table I). Data were collected at wavelengths corresponding to the high energy remote (λ1) and inflection (λ2) of a selenium MAD experiment. Data were indexed and reduced with Mosflm16 and scaled using SCALA from the CCP4 suite.14 Diffraction data statistics are summarized in Table I. The selenium substructure was solved using SOLVE.17 Refinement of the Se sites resulted in a mean figure of merit of 0.39 to a resolution of 2.5 Å. Phase extension to 2.3 Å was performed using RESOLVE,17 with a solvent content of 0.5 and a starting two-fold noncrystallographic symmetry (NCS) matrix derived from the substructure solution. Automatic model building was performed with RESOLVE, resulting in a dimer model containing 288 residues (72%), with 89 (22%) of the side chains fitted. This initial model was rebuilt using iterative ARP/wARP runs,18 which built 354 residues (88%), with 345 residues docked into the sequence (86%). Model completion and refinement were performed with the remote (λ1) data set using COOT19 and REFMAC5.20 Refinement statistics are summarized in Table I. Analysis of the stereochemical quality of the structure was accomplished using AutoDepInputTool,21 MolProbity,15 SFcheck 4.0,14 and WHATIF 5.0.22 Protein quaternary structure analysis was performed using the PQS server.23 Figure 1 was adapted from an analysis using PDBsum,24 and all other figures were prepared with PyMOL (DeLano Scientific). Atomic coordinates and experimental structure factors for TM1030 at 2.3 Å resolution have been deposited in the PDB and are accessible under the code 1zkg. Stereo ribbon diagram of the crystal structure of TM1030 monomer. A: The DNA-binding domain and the regulatory domain are colored in green and violet, respectively. The helices, as well as the N- and C-termini, are labeled. B: Schematic diagram showing the secondary structural elements in TM1030 superimposed on its primary sequence. The α-helices and 310-helix (H6A) are indicated. The crystal structure of TM1030 (Fig. 1) was determined to 2.3 Å resolution using the MAD method (Table I). The asymmetric unit includes two TM1030 subunits, two unknown ligands (UNLs) and 56 water molecules. Electron density was not observed for residues from the expression and purification tag for both subunits and residue Val 1 of subunit B. The Matthews' coefficient (Vm)25 for TM1030 is 2.6 Å3/Da, and the estimated solvent content is 52.2%. The Ramachandran plot,26 as produced by Molprobity,27 shows that 97.2 and 99.8% of the main chain torsion angles are in the favored and allowed regions, respectively. The only outlier is the surface exposed residue R74 (subunit A), which is poorly defined in the electron density map. TM1030 is an all-helical protein, comprised of 10 α-helices (H1-H7, H7A-H9) and a 310-helix (H6A), and adopts a two-domain architecture similar to TetR (Fig. 1). The N-terminal DNA-binding domain is composed of the first three α-helices. The H2 and H3 α-helices of this domain form a canonical HTH motif. The regulatory domain is made up of an antiparallel helical bundle (H4-H5 and H7-H9) and helix H6 that is packed nearly orthogonal to the long axis of this helical bundle. A DALI28 search revealed structural similarity to several microbial transcriptional regulators. The top 13 hits in the search belong to the TetR family and include proteins from Pseudomonas aeruginosa (PDB accession code: 2gen, 2fbq, 2fd5), Salmonella typhimurium (1t33), Staphylococcus aureus (1jty), Mycobacterium tuberculosis (1t56), Rhodococcus sp. (2gfn, 2g7g), Bacillus cereus (1sgm, 2fx0, 1zk8, 2fq4), and Streptomyces coelicolor (1ui5). The Z-scores for the structural alignments of TM1030 with these top hits were in the range of 14.1–7.4 and the corresponding RMSDs are in the range of 3.0–6.8 Å where at least 75% of the Cα atoms (of the total 200 amino acids) were included. Notably, these structures share less than 21% sequence identity to TM1030, and nine of these structures were determined at PSI-funded Structural Genomics (SG) centers. The N-terminal DNA-binding domain of TM1030 displays remarkable structural similarity to the homologous domains in all TetR-like proteins, while differences are much greater in the C-terminal regulatory domain. In particular, the relative orientation of the α-helices in the regulatory domain that mediates homodimerization4 differs significantly among members of the TetR family. A pair of α-helices (H8 and H9) in the TM1030 regulatory domain is involved in mediating most of the inter-subunit interaction. The inter-subunit interactions are mostly hydrophobic (V145, I149, F153, W156, F157, F161, V164, V189, M190, I193, and L194) and are further stabilized by four salt-bridges [D144(A) – R192(B), K152(A) – E186(B), D144(B) – R192(A), and K152(B) – E186(A)], and three hydrogen bonds [E163(A) – E163(B), K196(A) – T199(B), and K196(B) – T199(A)]. An analysis using size exclusion chromatography coupled with static light scattering supports the assignment of TM1030 as a dimer in solution. Furthermore, the biologically relevant homodimerization in the TetR family is mediated by similar helix-to-helix contacts,4, 29-31 suggesting that the dimer observed for TM1030 is functionally relevant. The Midwest Center for Structural Genomics (MCSG) has also determined the crystal structure of a TM1030 construct to 2.0 Å resolution (PDB code 1z77). A structural superposition revealed a significant global conformational difference between the two structures (Fig. 2), despite very similar crystallization conditions (Table II). Two modes of structural alignments were explored using as anchors either the conserved N-terminal DNA-binding domain or the C-terminal homodimerization-mediating α-helices (H8 and H9). The corresponding RMSD values for the N-terminal or C-terminal based structural alignments for all 200 Cα atoms of TM1030 (JCSG, subunits A and B) with TM1030 (MCSG) are in the range of 3.3–5.0 and 5.1–6.4 Å, respectively. In spite of the large structural difference, the global fold and individual structural elements are mostly retained in both structures, with noticeable differences confined to the lengths of α-helices H1, H3, H4, and H6A. Moreover, the inter-subunit interactions mediated by α-helices H8 and H9 in TM1030 (JCSG) are largely retained in TM1030 (MCSG), even though the biological dimer in TM1030 (JCSG) is formed from subunits related by twofold NCS, while the TM1030 (MCSG) subunits are related by exact crystallographic symmetry. The calculated total buried surface area between the monomers in TM1030 (JCSG) and TM1030 (MCSG) is also quite comparable [1497 Å2 (JCSG) vs. 1401 Å2 (MCSG)]. In addition, the residues involved in inter-subunit interactions are largely unperturbed in these structures suggesting that the conformational changes between the two TM1030 structures are unlikely to be caused by the inter-subunit/crystal-packing interactions. When the structural superpositions are restricted to individual domains of TM1030 [Fig. 2(B,C)], regions of large conformational differences are mostly confined to the regulatory domain (RMSD of 1.75 Å for 153 Cα atoms) rather than in the DNA-binding domain (RMSD of 0.5 Å for 47 Cα atoms). Structure comparison of TM1030 (JCSG) and TM1030 (MCSG). Structure alignment of A: full-length TM1030 monomer. The alignment was optimized over residues from the C-terminal homodimerization mediating α-helices (H8-H9). B: The DNA-binding domain. C: The regulatory domain. JCSG and MCSG structures are shown in red and green, respectively. The α-helices, as well as the N- and C-termini, are indicated. While searching for a plausible basis for the conformational differences in the regulatory domain, we identified a ∼12 Å deep cavity in each of the TM1030 subunits that is located within the helical bundle of the regulatory domain [Fig. 3(A)]. The binding pocket, whose total volume is approximately 2000 Å3, is predominantly lined by hydrophobic residues. The TM1030 cavity has a 10–17 Å wide opening, that is formed by residues from α-helices H6A, H7, H7A, and H8 [Figs. 1(A) and 3(A)], which is likely to serve as an entrance to this putative binding pocket. The location of the cavity is in proximity to the ligand-binding pocket of TetR, but the volume of the cavity and the nature of the residues lining the cavity in the two proteins are quite different. Interestingly, each TM1030 (JCSG) subunit contains a semi-circular region of positive electron density in both the omit Fo − Fc and 2Fo − Fc maps in the ligand-binding pocket, indicative of a bound ligand (Fig. 3(B,C)]. However, such density was not observed in the TM1030 (MCSG) structure. The residues within 4 Å of this electron density in TM1030 (JCSG) are T59, L62, and F66 from α-helix H4; W85, I86, and K89 from α-helix H5; S124, Q125, and F128 from α-helix H7 and helix H7A; and F158, F161, E162, and Y165 from α-helix H8. The bound ligand is surrounded by hydrophobic, polar, and electrostatic groups including a cluster of aromatic rings. No biologically relevant ligand that could fit such electron density was added during the protein purification or crystallization of TM1030 (JCSG). Consideration of the shape and length of the density suggests it might represent a lipid molecule acquired in vivo within the E. coli host cells used for heterologous expression. However, the electron density is not consistent with any lipid with a head group or a carboxylate, such as palmitic acid. On the other hand, the density can be modeled by a relatively short fragment of PEG, in particular heptaethylene glycol [Fig. 3(D)], that probably originates from the crystallization solution (see e.g. Koepke et al., 2003; Zhu et al., 2006).32, 33 Nevertheless, as the precise identity of the ligand molecule has yet to be determined, we modeled and deposited it in the PDB as an UNL. The absence of a ligand in TM1030 (MCSG) together with the difference in conformation strongly suggests that this structure represents the apo-form of TM1030. A DNA-binding model for TM1030 based on TetR-tetO complex shows that the two DNA-binding domains from apo-TM1030 fit into the major groove of dsDNA, whereas the ligand-bound form of TM1030 is not in a favorable conformation to bind dsDNA [Fig. 3(E)], suggesting that TM1030 is most likely a transcription repressor. Ligand-binding pocket and DNA-binding model. A: Surface representation of the ligand-binding pocket. B: The omit Fo − Fc electron density map corresponding to a region in the ligand-binding pocket. C: The residues within 4 Å of the bound ligand in TM1030 (JCSG). D: The omit Fo − Fc electron density map. A heptaethylene glycol molecule has been modeled to fit the electron density, but coordinates for the ligand are deposited in the PDB as a UNL. The electron density is contoured at 2.5 σ. E: Computational model of TM1030-operator dsDNA complex. The model is based on the crystal structure of TetR-operator dsDNA complex (PDB accession code: 1qpi). Apo-TM1030 and ligand-bound TM1030 are colored in blue and grey, respectively. N- and C-termini of TM1030 are indicated. One of the fundamental means by which bacteria adapt to varying environmental conditions is based on the regulation of gene expression at the transcriptional level.1-3, 34 Structural information regarding these transcriptional regulators is crucial to our understanding of how transcriptional regulatory networks control the microbial responses to different environmental challenges, including multidrug resistance, solvent tolerance, stress response, and pathogenesis. The efforts of the PSI-funded SG centers have resulted so far in the determination of nine TetR-like protein structures. Although these TetR-like structures share a high degree of overall fold similarity, their structures, particularly those of the regulatory domains, are very divergent and cannot be readily predicted. The ability to create novel binding sites for various effectors within the regulatory domain of proteins is perhaps driven by mutations that have little effect on the overall three-dimensional structure, but exert a large effect on the plasticity of effector binding sites. A detailed structural analysis, including identification of a biologically relevant ligand for TM1030, will offer insights into the structural basis of its transcription repressor function. Portions of this research were carried out at the Stanford Synchrotron Radiation Laboratory (SSRL), the Advanced Light Source (ALS), and the Advanced Photon Source (APS). The SSRL is a national user facility operated by Stanford University on behalf of the U.S. Department of Energy, Office of Basic Energy Sciences. The SSRL Structural Molecular Biology Program is supported by the Department of Energy, Office of Biological and Environmental Research, and by the National Institutes of Health (National Center for Research Resources, Biomedical Technology Program, and the National Institute of General Medical Sciences). The ALS is supported by the Director, Office of Science, Office of Basic Energy Sciences, Materials Sciences Division, of the U.S. Department of Energy under Contract No. DE-AC03-76SF00098 at Lawrence Berkeley National Laboratory. Use of the Argonne National Laboratory Structural Biology Center beamlines at the APS was supported by the U. S. Department of Energy, Office of Biological and Environmental Research, under Contract No. W-31-109-ENG-38.