Targets for drugs have so far been predicted on the basis of molecular or cellular features, for example, by exploiting similarity in chemical structure or in activity across cell lines. We used phenotypic side-effect similarities to infer whether two drugs share a target. Applied to 746 marketed drugs, a network of 1018 side effect-driven drug-drug relations became apparent, 261 of which are formed by chemically dissimilar drugs from different therapeutic indications. We experimentally tested 20 of these unexpected drug-drug relations and validated 13 implied drug-target relations by in vitro binding assays, of which 11 reveal inhibition constants equal to less than 10 micromolar. Nine of these were tested and confirmed in cell assays, documenting the feasibility of using phenotypic information to infer molecular interactions and hinting at new uses of marketed drugs.
Unwanted side effects of drugs are a burden on patients and a severe impediment in the development of new drugs. At the same time, adverse drug reactions (ADRs) recorded during clinical trials are an important source of human phenotypic data. It is therefore essential to combine data on drugs, targets and side effects into a more complete picture of the therapeutic mechanism of actions of drugs and the ways in which they cause adverse reactions. To this end, we have created the SIDER ('Side Effect Resource', http://sideeffects.embl.de) database of drugs and ADRs. The current release, SIDER 4, contains data on 1430 drugs, 5880 ADRs and 140 064 drug-ADR pairs, which is an increase of 40% compared to the previous version. For more fine-grained analyses, we extracted the frequency with which side effects occur from the package inserts. This information is available for 39% of drug-ADR pairs, 19% of which can be compared to the frequency under placebo treatment. SIDER furthermore contains a data set of drug indications, extracted from the package inserts using Natural Language Processing. These drug indications are used to reduce the rate of false positives by identifying medical terms that do not correspond to ADRs.
ABSTRACT Arena3D web is an interactive web tool that visualizes multi-layered networks in 3D space. In this update, Arena3D web supports directed networks as well as up to nine different types of connections between pairs of nodes with the use of Bézier curves. It comes with different color schemes (light/gray/dark mode), custom channel coloring, four node clustering algorithms which one can run on-the-fly, visualization in VR mode and predefined layer layouts (zig-zag, star and cube). This update also includes enhanced navigation controls (mouse orbit controls, layer dragging and layer/node selection), while its newly developed API allows integration with external applications as well as saving and loading of sessions in JSON format. Finally, a dedicated Cytoscape app has been developed, through which users can automatically send their 2D networks from Cytoscape to Arena3D web for 3D multi-layer visualization. Arena3D web is accessible at http://arena3d.pavlopouloslab.info or http://arena3d.org
Additional file 2: Fig. S1. We show the total number of E. coli softcore genes’ related publications (red line relative to the left y-axis) and the total number of genes mentioned in the respective literature (blue line relative to the right y-axis) from year 1939 up to year 2021. The blue dashed vertical lines mark the expansion period for the total number of genes from year 1965 to 2009. It apparently plateaus after the year 2019. The red dashed vertical lines at years 1970 and 2007 indicate two periods of publication dynamics: 1970–2007 and 2007–2021. The ratio of the number of publications in each year to the total number of new genes identified in each year is shown in the insert. Fig. S2. FPE plots for different FPE score ranges from year 1960 until 2021 for E. coli K-12 genes are separately shown for five different categories, i.e. (A) very understudied, (B) understudied, (C) moderately studied, (D) intensively studied and (E) very intensively studied. The y-axis is given in the same scale for visual comparison across different categories. Fig. S3. We illustrate the number of new genes of E. coli K-12 achieving the FPE score ranges (T0, T1, T5, T10, T15, T20, T25, T30, T35, T40, T45, T50, T75, T100, T500) across the years in (A) phase 1 and (B) phase 2 periods. The linear regression line (number of new genes (y-axis) versus year (x-axis)) is shown. The magnitude of the slope is provided in Table 2. Fig. S4. FPE plots for different FPE score range from year 1960 until 2021 for E. coli softcore genes are separately shown for five different categories, i.e. (A) very understudied, (B) understudied, (C) moderately studied, (D) intensively studied and (E) very intensively studied. The y-axis is given in the same scale for visual comparison across different categories. Fig. S5. We illustrate the number of new genes of the E. coli softcore genome achieving the FPE score ranges (T0, T1, T5, T10, T15, T20, T25, T30, T35, T40, T45, T50, T75, T100, T500) across the years in (A) phase 1 and (B) phase 2 periods. The linear regression line (number of new genes (y-axis) versus year (x-axis)) is shown. The magnitude of the slope is provided in Additional file 1: Table S5. Fig. S6. Prediction of the transmembrane (TM) region in the protein sequence yahV (GF_29643) in E. coli K-12 MG1655 using TMHMM 2.0. The TM region is predicted to cover positions 4-23 of the protein sequence. Fig. S7. The upstream and downstream genes of yahV based on NCBI RefSeq. The betABIT operon is upstream of yahV gene. betABIT is expressed only under aerobic condition during osmotic stress for production of osmoprotectants. The pdeL gene, on the other hand, is downstream of the gene yahV. The pdeL gene appears involved in the regulation of cell motility. Fig. S8. Neighboring gene families of GF_29643 (yahV; circled in red) focusing on genomes that carry GF_29643. Ten GFs upstream and ten GFs downstream of GF_29643 are extracted and investigated. Each GF is represented as a node and two nodes are linked by an edge if they are next to each other. The thickness of the edge represents the weighted link between the two GFs. Clearly, GF_29643’s genomic position is conserved across the E. coli genomes that carry the yahV gene. Note that GF_8617 represents the betT gene and GF_25808 contains the pdeL gene. Fig. S9. The predicted transmembrane beta-barrel (TMBB) structure of protein yddL (GF_4841) using BetAware-Deep. The predicted localization is outer membrane TMBB with the overall TMBB probability of 0.93. There are four (4) TM β-strand segments as shown in the figure. Fig. S10. We illustrate the GFs associated with GF_29643, GF_4841 and GF_8394. The associated GFs of these three GFs have high overlap with each other and, therefore, can be related. Each node represents a GF and the edge (connecting line) indicates a significant coincident association between nodes (P-value ≤ 1 × 10–20). The size of the node is determined by the node’s degree (the number of associated GFs). The color of the node is represented by a gradient color from grey to red which is determined by the node’s degree as well. The three cluster-founding GFs are highlighted by red arrows. Please note that only 60 out of 68 GFs found are present in E. coli K-12 MG1655. Fig. S11. The number of overlapping associated GFs among three GFs, i.e., GF_29643, GF_8394 and GF_4841. Fig. S12. Manual annotation of associated GFs to GF_29643 (yahV), GF_4841 (yddL), and GF_8394 (paaE). There are four potential biological processes related to these 3 GFs, i.e. osmotic regulation, stress response, cell motility and energy metabolism. The corresponding genes are given for each biological process. The genes with unclear function are given as “Not Clear”. Fig. S13. The protein expression of 11 genes extracted from Caglar’s proteomics data. Only 11 genes out of 30 gene families, which are fully connected or significantly associated to each other, have the protein expression in Caglar’s proteomics data. Please note that the E. coli strain used in Caglar’s study is E. coli REL606, which belongs to phylogroup A (sequence type ST93). This is different from E. coli K-12 MG1655, which has sequence type ST10. The highlighted box (with a red dashed line) emphasizes the expression results from cultures under NaCl_Stress condition. Fig. S14. We visualize the gene expression of 19 genes extracted from the Metris et al. data in accordance with osmotic conditions. These 19 genes are from our set of 30 GFs, which are fully connected or significantly associated to each other. Please note that the E. coli strain used in Metris’ study is E. coli K12 MG1655, which is the same as the E. coli strain in our analysis.
For tissues to carry out their functions, they rely on the right proteins to be present. Several high-throughput technologies have been used to map out which proteins are expressed in which tissues; however, the data have not previously been systematically compared and integrated. We present a comprehensive evaluation of tissue expression data from a variety of experimental techniques and show that these agree surprisingly well with each other and with results from literature curation and text mining. We further found that most datasets support the assumed but not demonstrated distinction between tissue-specific and ubiquitous expression. By developing comparable confidence scores for all types of evidence, we show that it is possible to improve both quality and coverage by combining the datasets. To facilitate use and visualization of our work, we have developed the TISSUES resource (http://tissues.jensenlab.org), which makes all the scored and integrated data available through a single user-friendly web interface.
Article28 February 2012Open Access Genes adopt non-optimal codon usage to generate cell cycle-dependent oscillations in protein levels Milana Frenkel-Morgenstern Corresponding Author Milana Frenkel-Morgenstern Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel Present address: Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid 28029, Spain Search for more papers by this author Tamar Danon Tamar Danon Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel Search for more papers by this author Thomas Christian Thomas Christian Department of Biochemistry and Molecular Biology, Thomas Jefferson University, Philadelphia, PA, USA Search for more papers by this author Takao Igarashi Takao Igarashi Department of Biochemistry and Molecular Biology, Thomas Jefferson University, Philadelphia, PA, USA Search for more papers by this author Lydia Cohen Lydia Cohen Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel Search for more papers by this author Ya-Ming Hou Ya-Ming Hou Department of Biochemistry and Molecular Biology, Thomas Jefferson University, Philadelphia, PA, USA Search for more papers by this author Lars Juhl Jensen Lars Juhl Jensen Disease Systems Biology, Novo Nordisk Foundation for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark Search for more papers by this author Milana Frenkel-Morgenstern Corresponding Author Milana Frenkel-Morgenstern Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel Present address: Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid 28029, Spain Search for more papers by this author Tamar Danon Tamar Danon Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel Search for more papers by this author Thomas Christian Thomas Christian Department of Biochemistry and Molecular Biology, Thomas Jefferson University, Philadelphia, PA, USA Search for more papers by this author Takao Igarashi Takao Igarashi Department of Biochemistry and Molecular Biology, Thomas Jefferson University, Philadelphia, PA, USA Search for more papers by this author Lydia Cohen Lydia Cohen Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel Search for more papers by this author Ya-Ming Hou Ya-Ming Hou Department of Biochemistry and Molecular Biology, Thomas Jefferson University, Philadelphia, PA, USA Search for more papers by this author Lars Juhl Jensen Lars Juhl Jensen Disease Systems Biology, Novo Nordisk Foundation for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark Search for more papers by this author Author Information Milana Frenkel-Morgenstern 1,2, Tamar Danon1, Thomas Christian3, Takao Igarashi3, Lydia Cohen1, Ya-Ming Hou3 and Lars Juhl Jensen4 1Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel 2Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel 3Department of Biochemistry and Molecular Biology, Thomas Jefferson University, Philadelphia, PA, USA 4Disease Systems Biology, Novo Nordisk Foundation for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark *Corresponding author. Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel. Tel.: +34 601046898; Fax: +34 912945037; E-mail: [email protected] or E-mail: [email protected] Molecular Systems Biology (2012)8:572https://doi.org/10.1038/msb.2012.3 PDFDownload PDF of article text and main figures. Peer ReviewDownload a summary of the editorial decision process including editorial decision letters, reviewer comments and author responses to feedback. ToolsAdd to favoritesDownload CitationsTrack CitationsPermissions ShareFacebookTwitterLinked InMendeleyWechatReddit Figures & Info The cell cycle is a temporal program that regulates DNA synthesis and cell division. When we compared the codon usage of cell cycle-regulated genes with that of other genes, we discovered that there is a significant preference for non-optimal codons. Moreover, genes encoding proteins that cycle at the protein level exhibit non-optimal codon preferences. Remarkably, cell cycle-regulated genes expressed in different phases display different codon preferences. Here, we show empirically that transfer RNA (tRNA) expression is indeed highest in the G2 phase of the cell cycle, consistent with the non-optimal codon usage of genes expressed at this time, and lowest toward the end of G1, reflecting the optimal codon usage of G1 genes. Accordingly, protein levels of human glycyl-, threonyl-, and glutamyl-prolyl tRNA synthetases were found to oscillate, peaking in G2/M phase. In light of our findings, we propose that non-optimal (wobbly) matching codons influence protein synthesis during the cell cycle. We describe a new mathematical model that shows how codon usage can give rise to cell-cycle regulation. In summary, our data indicate that cells exploit wobbling to generate cell cycle-dependent dynamics of proteins. Synopsis Most cell cycle-regulated genes adopt non-optimal codon usage, namely, their translation involves wobbly matching codons. Here, the authors show that tRNA expression is cyclic and that codon usage, therefore, can give rise to cell-cycle regulation of proteins. Most cell cycle-regulated genes adopt non-optimal codon usage. Non-optimal codon usage can give rise to cell-cycle dynamics at the protein level. The high expression of transfer RNAs (tRNAs) observed in G2 phase enables cell cycle-regulated genes to adopt non-optimal codon usage, and conversely the lower expression of tRNAs at the end of G1 phase is associated with optimal codon usage. The protein levels of aminoacyl-tRNA synthetases oscillate, peaking in G2/M phase, consistent with the observed cyclic expression of tRNAs. Introduction The cell cycle is a fundamental cellular process that allows cells to multiply and faithfully transfer their genetic information to their offspring (Csikász-Nagy, 2009). The full complexity of this process became apparent a decade ago with the first genome-wide microarray studies of the mitotic cell cycle of budding yeast (Cho et al, 1998; Spellman et al, 1998). During the eukaryotic cell cycle, gene expression is regulated at different levels, including through the translation of mRNAs into proteins (Sonenberg and Hinnebusch, 2009). Accurate translation is a complex event coordinated by essential components of the cell, such as the ribosome, messenger RNAs, aminoacylated (charged) transfer RNAs (tRNAs), and a host of additional protein and RNA factors (Francklyn et al, 2002; Lackner and Bähler, 2008). The tRNAs have a central role in translation as they are adaptor molecules that link the nucleotide sequence of the mRNA and the amino-acid sequence of a protein (Lowe and Eddy, 1997; Percudani et al, 1997; Schattner et al, 2005; Goodenbour and Pan, 2006). The expression of tRNAs is tissue specific and it varies in distinct cellular conditions (Dittmar et al, 2006). Recent studies demonstrate that the redundancy of the genetic code allows a choice to be made between 'synonymous' codons for the same amino acid, which may have dramatic effects on the rate of translation due to the tRNA recycling and channeling into the ribosome (Cannarozzi et al, 2010; Weygand-Durasevic and Ibba, 2010; Brackley et al, 2011; Gingold and Pilpel, 2011; Plotkin and Kudla, 2011). Moreover, mRNAs usually start by using the codons corresponding to rarer tRNAs, undergoing a slower phase of elongation, which is then followed by a faster phase (Tuller et al, 2010). The 'redundancy' in the genetic code implies that 61 codons are translated requiring fewer than 61 tRNAs according to the 'wobble' base-pairing rules (isoaccepting codons; Crick, 1966). This is especially true when the base at the 5′ end of the anticodon is inosine (abbreviated as I), which deviates from the standard base-pairing rules. The four main wobble base pairs are guanine-uracil, inosine-uracil, inosine-adenine, and inosine-cytosine (G:U, I:U, I:A, I:C; Lander et al, 2001). Finally, the Percudani rules state that tRNAs only wobble with a synonymous codon if there is no better tRNA for that codon (Percudani et al, 1997). Due to the degeneracy of the genetic code, all amino acids except methionine and tryptophan are encoded by multiple, synonymous codons. The usage of synonymous codons is far from uniform and there is a strong preference toward certain codons in highly expressed genes when compared with other genes (Sharp et al, 1986; Lavner and Kotlar, 2005; Goodenbour and Pan, 2006). Indeed, codon usage preferences are closely correlated with the abundance of corresponding tRNAs in bacteria and yeast (Grantham et al, 1981; Ikemura, 1981, 1982; Futcher et al, 1999), which maximizes the speed and accuracy of protein translation (Gouy and Gautier, 1982; Ikemura, 1985; Akashi and Eyre-Walker, 1998; Duret and Mouchiroud, 1999; Coghlan and Wolfe, 2000; Duret, 2000; Wright et al, 2004; Drummond et al, 2006). However, charging level of some tRNAs matches some anomalous codon usage patterns for different groups of genes in bacteria (Liljenström et al, 1985; Dittmar et al, 2005). Moreover, the correspondence between codon adaptation and gene expression makes translation efficient at a global level rather than at the level of specific genes (Kudla et al, 2009). More specifically, the first 30–50 codons of most mRNA sequences are less efficiently translated than the following part of their sequences (Tuller et al, 2010). The optimal correlation between tRNA levels and their corresponding codon frequencies are dependent on the total amount of tRNAs, ribosomes (Kudla et al, 2009), and the aminoacyl-tRNA synthetases (aaRSs) that charge tRNAs through a two-step aminoacylation reaction using ATP (Orfanoudakis et al, 1987). Finally, changes in the ATP availability in cells influence the concentration of charged tRNAs during a cell cycle (Ibba and Söll, 2004). Non-optimal codons adapt wobble codon–anticodon base pairing with a low binding affinity. Recent studies revealed that synonymous changes for non-optimal codons can alter the expression of human genes (Kimchi-Sarfaty et al, 2007). Moreover, the codons with the least amount of tRNAs and, thus, the lowest rate of translation, do not necessarily have the lowest genome frequency (Parmley and Huynen, 2009), and they may fulfill a role in translation 'pausing' between protein domains (Makhoul and Trifonov, 2002). However, the function of non-optimal codons, in general, and of wobble codon–anticodon base pairing, in particular, in regulating the temporal aspects of protein translation remains unclear in eukaryotes. We have studied translation regulation of cell cycle-dependent genes through comparative analyses of codon preferences, dynamic quantitative proteomics (Sigal et al, 2006a; Cohen et al, 2008) and mathematical modeling. We discovered that in four distant eukaryotes, proteins encoded by cell cycle-regulated mRNAs have similar preferences in terms of non-optimal codon usage and wobble codon–anticodon base pairing. The dynamics of the charged tRNA pool is expected to vary during the cell cycle as a result of the variations in the ATP availability (Orfanoudakis et al, 1987). In addition, we found experimentally that the levels of glycyl-, threonyl-, and glutamyl-prolyl-aminoacyl-tRNA synthetases oscillate during the human cell cycle, and that tRNA expression levels increase in the G2/M phase of the yeast cell cycle. Moreover, tRNAs are most weakly expressed toward the end of G1 phase. Similarly, we found that genes expressed in different phases of the cell cycle adopt different codon preferences. We show that about 15% of the cell cycle-regulated genes expressed in the G1 phase adopt relatively optimal codon usage, even at the beginning of their coding sequences. All other cell cycle-regulated genes prefer non-optimal codons for their coding sequences. Finally, we developed a mathematical model based on a competitive mechanism in which the cycling of charged tRNAs leads to oscillations in the rate of translation for mRNAs containing non-optimal codons. Results Codon preferences of cell cycle-regulated genes In unicellular prokaryotes and eukaryotes, the abundance of certain tRNAs correlates with the codon preferences of genes encoding highly expressed proteins, for example, ribosomal proteins (Percudani et al, 1997; Kanaya et al, 1999; Bernstein et al, 2002; Lavner and Kotlar, 2005; Kotlar and Lavner, 2006). Thus, codons that perfectly match the anticodons of the tRNAs are preferentially used in highly expressed genes (Grosjean and Fiers, 1982). The mRNAs coding for rare proteins also have selective codon usage, albeit much weaker than the mRNAs coding abundant proteins (Liljenström and von Heijne, 1987). We hypothesized that cell cycle-regulated genes should also exhibit a preference for certain codons and thus, we analyzed the codon usage preferences for synonymous codons in three sets of human cell cycle-regulated genes, B1, B2, top-600, from an earlier study (Jensen et al, 2006; see Materials and methods). Although the B1 set of genes is the most reliable group of cycling genes, it includes highly expressed genes that are strongly biased in terms of their codon usage, a situation which is undesirable for our purposes. By contrast, highly expressed genes are not so abundant in the B2 and top-600, although they are of somewhat less reliable. The three sets of cell cycle-regulated genes gave consistent results, either all showing positive or negative preferences for a given codon (Table I). To evaluate the statistical significance of this result, P-values were calculated from 10 000 bootstrap samples with the same codon adaptation index (CAI) distribution as cell cycle-regulated genes (see Materials and methods and Table I). The codon preference was considered as significant when P-value <0.01 for at least two of the three sets of cycling genes (Table I). In fact, the codon usage is confounded by the local GC content (Drummond and Wilke, 2008) and thus, we produced an additional bootstrap procedure preserving the GC content instead of the CAI distribution of the cell cycle-regulated genes. The P-values obtained by this procedure did not alter the final conclusions (see Supplementary information). Table 1. The codon preferences for the sets of human cell cycle-regulated genes: B1, B2 and top-600 sets (Jensen et al, 2006) Aa Codon 5′ → 3′ Preferences human P-values human Anticodon 3′ → 5′ Binding at third position Affinity Organisma B1 B2 Top-600 B1 B2 Top-600 S.p. S.c. A.t. Ala GCA 0.04 0.05 0.03 0.05 0.09 0.14 CGI I:A Low • Ala GCC −0.1 −0.07 −0.04 0.0001 0.01 0.16 CGI I:C High • Ala GCG −0.01 −0.03 −0.02 0.58 0.01 0.03 CGC C:G High • • • Ala GCT 0.07 0.05 0.03 0.0001 0.0001 0.05 CGI I:T Low • • • Arg AGA 0.07 0.05 0.04 0.02 0.14 0.13 UCU U:A Low • Arg AGG −0.02 −0.02 −0.01 0.17 0.0001 0.02 UCC C:G High • • Arg CGA 0 0.03 0.02 0.19 0.0001 0.0001 GCI I:A Low Arg CGC −0.01 −0.04 −0.03 0.75 0.09 0.06 GCI I:C High Arg CGG −0.06 −0.04 −0.03 0.0001 0.2 0.19 GCC C:G High • • • Arg CGT 0.02 0.02 0.01 0.07 0.0001 0.04 GCI I:T Low • • Asn AAC −0.13 −0.11 −0.08 0.0001 0.0001 0.0001 UUG G:C High • Asn AAT 0.13 0.11 0.08 0.0001 0.0001 0.0001 UUG G:T Low • Asp GAC −0.1 −0.1 −0.07 0.01 0.0001 0.01 CUG G:C High • Asp GAT 0.1 0.1 0.07 0.01 0.0001 0.01 CUG G:T Low • Cys TGC −0.15 −0.12 −0.04 0.0001 0.0001 0.37 UCG G:C High • • • Cys TGT 0.15 0.12 0.04 0.0001 0.0001 0.37 UCG G:T Low • • • Gln CAA 0.1 0.06 0.05 0.0001 0.42 0.09 GUU U:A Low Gln CAG −0.1 −0.06 −0.05 0.0001 0.43 0.1 GUC C:G High Glu GAA 0.13 0.1 0.08 0.0001 0.03 0.04 CUU U:A Low • • Glu GAG −0.13 −0.1 −0.08 0.0001 0.04 0.04 CUC C:G High • • Gly GGA 0.04 0.05 0.04 0.35 0.15 0.29 CCU U:A Low • Gly GGC −0.04 −0.06 −0.04 0.17 0.01 0.21 CCG G:C High • Gly GGG −0.05 −0.03 −0.03 0.02 0.09 0.0001 CCC C:G High • Gly GGT 0.05 0.04 0.03 0.01 0.0001 0.0001 CCG G:T Low • • His CAC −0.14 −0.13 −0.07 0.0001 0.0001 0.05 GUG G:C High • • His CAT 0.14 0.13 0.07 0.0001 0.0001 0.05 GUG G:T Low • • Ile ATA 0.05 0.05 0.04 0.03 0.12 0.08 UAI I:A Low Ile ATC −0.12 −0.12 −0.08 0.01 0.0001 0.02 UAI I:C High • Ile ATT 0.07 0.07 0.04 0.02 0.0001 0.06 UAI I:T Low • • • Leu CTA 0.02 0.01 0.01 0.0001 0.03 0.02 GUI I:A Low • Leu CTC −0.05 −0.04 −0.03 0.0001 0.0001 0.0001 GUI I:C High • • Leu CTG −0.1 −0.08 −0.06 0.0001 0.04 0.02 GUC C:G High Leu CTT 0.03 0.04 0.03 0.06 0.01 0.05 GUI I:T Low • • Leu TTA 0.06 0.04 0.03 0.0001 0.03 0.0001 AAU U:A Low • Leu TTG 0.04 0.03 0.02 0.0001 0.0001 0.0001 AAC C:G High • • Lys AAA 0.04 0.09 0.06 0.43 0.01 0.11 UUU U:A Low Lys AAG −0.04 −0.09 −0.06 0.44 0.01 0.11 UUC C:G High Met ATG 0 0 0 1 1 1 UAC C:G High • • • Phe TTC −0.13 −0.1 −0.07 0.0001 0.0001 0.0001 AAG G:C High • Phe TTT 0.13 0.1 0.07 0.0001 0.0001 0.0001 AAG G:T Low • Pro CCA 0.07 0.04 0.04 0.01 0.06 0.02 GGI I:A Low • • • Pro CCC −0.1 −0.06 −0.06 0.0001 0.02 0.0001 GGI I:C High • • Pro CCG −0.02 −0.03 −0.02 0.19 0.02 0.07 GGC C:G High • • • Pro CCT 0.05 0.05 0.04 0.01 0.0001 0.0001 GGI I:T Low Ser AGC −0.05 −0.05 −0.03 0.0001 0.0001 0.02 UCG G:C High • • Ser AGT 0.03 0.04 0.03 0.03 0.0001 0.0001 UCG G:T Low Ser TCA 0.03 0.03 0.02 0.1 0.02 0.34 AGI I:A Low • Ser TCC −0.05 −0.04 −0.03 0.0001 0.0001 0.0001 AGI I:C High • Ser TCG −0.02 −0.02 −0.02 0.0001 0.01 0.57 AGC C:G High • • Ser TCT 0.06 0.04 0.03 0.0001 0.0001 0.01 AGI I:T Low • • • Thr ACA 0.01 0.03 0.02 0.63 0.29 0.72 UGI I:A Low • Thr ACC −0.05 −0.07 −0.05 0.09 0.0001 0.1 UGI I:C High • Thr ACG −0.05 −0.03 −0.02 0.0001 0.0001 0.11 UGC C:G High • • • Thr ACT 0.09 0.07 0.05 0.0001 0.0001 0.0001 UGI I:T Low • • • Trp TGG 0 0 0 1 1 1 ACC G:C High • • • Tyr TAC −0.08 −0.1 −0.06 0.03 0.0001 0.05 AUG G:C High • Tyr TAT 0.08 0.1 0.06 0.04 0.0001 0.05 AUG G:T Low • Val GTA 0.07 0.05 0.03 0.0001 0.0001 0.0001 CUI I:A Low Val GTC −0.05 −0.05 −0.03 0.0001 0.0001 0.0001 CUI I:C High • Val GTG −0.09 −0.06 −0.05 0.01 0.15 0.03 CUC C:G High • Val GTT 0.07 0.06 0.05 0.01 0.01 0.01 CUI I:T Low • • We found that cell cycle-regulated genes prefer non-optimal codons, which are recognized by wobble base pairing, and thus have a low codon–anticodon binding affinity (Table I). For instance, TTT was overrepresented among cycling genes when we consider the TTT and TTC codons of phenylalanine (Table I). While no tRNA genes exist for the corresponding AAA anticodon, a tRNA gene does exists with the GAA anticodon. In addition, asparagine, aspartic acid, cysteine, histidine, and tyrosine were similarly seen to display a preference for the non-optimal codons (Table I). Using accurate thermodynamic data for binding affinities of all possible wobble base-pairing cases (I:C, I:A, I:T, G:T, G:C, C:G, U:A) (Watkins and SantaLucia, 2005), we found that for all amino acids cell cycle-regulated genes have a strong, significant (P<0.01) preference for codons with a low codon–anticodon binding affinity (Table I). To assess the biological importance of the codon preferences observed, we tested whether they are evolutionarily conserved. To this end, we analyzed sets of cell cycle-regulated genes in Schizosaccharomyces pombe, Saccharomyces cerevisiae, and Arabidopsis thaliana (Jensen et al, 2006). For both yeasts species, these genes show significant and consistent preferences for non-optimal codons of amino acids, which use the inosine modification at the wobble position. There are eight such amino acids in Schizosaccharomyces pombe (as in higher eukaryotes) and seven in S. cerevisiae (Supplementary Tables 1 and 2). For Arabidopsis thaliana, a significant preference for non-optimal codons was found for amino acids encoded by two or more codons, also consistent with the trend in humans (Supplementary Table 3). Although the GC content of genes appears to influence the codon preferences of cell cycle-regulated genes in yeast (Supplementary Tables 4–7), the trends are nonetheless consistent with that observed for human genes. Together, these results show that the preference for using non-optimal codons to encode cell cycle-regulated proteins is conserved across distantly related eukaryotes (see Table I). To study if the cell cycle-regulated genes expressed in different phases of the cell cycle adopt the same codon preferences, we used the top-600 sets of genes. Notably, non-optimal codon usage was observed for genes expressed in all phases except the G1 phase (see Supplementary information). In this phase of the cell cycle, both ATP and charged tRNA concentrations are likely to be low (Orfanoudakis et al, 1987), as is the total tRNA pool, which we found to be lowest toward the G1 phase in yeast S. cerevisiae (Table II; Figure 1). As a result, relatively optimal codon preferences were observed in human and yeast genes expressed in G1 phase (Supplementary Table 8). Finally, we found that the level of aaRSs is also likely to be low in the G1 phase, while augmented in the G2/M phase of the human cell cycle (Figure 2A; Supplementary Figure 1). Taken together, these findings indicate that genes may use synonymous codons to adjust their expression pattern during a cell cycle. Figure 1.The tRNA concentration during the cell cycle of S. cerevisiae. The concentration was calculated as an average of the different points in the same phases of the cell cycle according to Table II. Download figure Download PowerPoint Figure 2.Total fluorescence as a function of the time during two cell cycles for YFP-tagged proteins, glycyl-tRNA synthetase (GARS), threonyl-tRNA synthetase (TARS), tryptophanyl-tRNA synthetase (WARS), and glutamyl-prolyl-tRNA synthetase (EPRS), when compared with GAPDH and ARGLU1 . (A) The lines represent the average fluorescence (±standard error) from >15 individual cells during two generations for the synthetases that show significant cell cycle-dependent protein dynamics. ARGLU1 is used as a positive control. (B) The total fluorescence (±standard error) for WARS and GAPDH as a negative control. WARS and GAPDH do not show the cell cycle-dependent protein dynamics. Source data is available for this figure in the Supplementary Information. (Source data for Figure 2A) Protein dynamics of cell-cycle-regulated proteins during two cell-cycles [msb20123-sup-0001-SourceData-S1.xls] (Source data for Figure 2B) Protein dynamics of non cell-cycle-regulated proteins during two cell-cycles [msb20123-sup-0002-SourceData-S2.xls] Download figure Download PowerPoint Table 2. The concentration of tRNA during the cell cycle in the yeast S. cerevisiae Time points (min) tRNA concentration (mg/ml) Estimated cell-cycle phase 0 10.0 Synchronized in M phase 30 7.8 M 60 14.9 G1 90 13.7 G1 120 4.1 G1 150 10.8 G1 180 7.9 S 210 11.5 S 240 21.3 G2 270 21.5 G2 300 9.7 M 330 8.9 M 360 11.1 M Protein dynamics of aaRSs aaRSs covalently attach amino acids to tRNAs and consequently, they have a fundamental role in controlling the amount of charged tRNAs available for protein synthesis (Ibba and Söll, 2004; Francklyn et al, 2008). Thus, we systematically measured the aaRSs available during the cell cycle of individual human cells. We used time-lapse microscopy to measure the dynamics of four aaRSs found in the LARC library (Sigal et al, 2006a, 2006b, 2007; Cohen et al, 2008; see Supplementary information), namely glycyl-tRNA synthetase (GARS), threonyl-tRNA synthetase (TARS), tryptophanyl-tRNA synthetase (WARS) and glutamyl-prolyl-tRNA synthetase (EPRS). In these studies, we also measured the dynamics of glyceraldehyde 3-phosphate dehydrogenase (GAPDH) as a negative control and that of the arginine-glutamate-rich protein-1 (ARGLU1) as a positive control, the expression of which is regulated through the cell cycle at the protein and mRNA levels (Sigal et al, 2006a; Supplementary Figure 2). Each synthetase was tagged with the yellow fluorescent protein (eYFP) at its endogenous chromosomal location in the H1299 cell line (see Supplementary information), and the resulting videos (recorded over 72 h) were analyzed to quantify the accumulation of the proteins at each time point as described previously (Sigal et al, 2006a). Cell-cycle regulation was defined on the basis of a criterion of at least two-fold difference in the rate of accumulation over the cell cycle, and a difference of at least eight-fold standard errors between the highest and lowest protein accumulation rate (Sigal et al, 2006a). Based on these criteria, the protein dynamics of GARS, TARS, EPRS, and ARGLU1 were clearly cell cycle dependent, whereas WARS and GAPDH could not be considered to have cell cycle-dependent protein dynamics (Figure 2; Supplementary Figure S1). Interestingly, glycine, threonine, and proline are encoded by four different codons and glutamic acid is by two codons. Therefore, cell cycle-dependent protein levels of GARS, TARS, and EPRS may be a source for the cell cycle-regulated behavior of charged tRNAGly, tRNAThr, tRNAGlu, and tRNAPro, as evident in our mathematical model described below. Tryptophan is only encoded by one codon, which leaves no margin for gene-specific, cell cycle-dependent translation rates through the use of suboptimal codons, and which would explain why WARS does not exhibit cell cycle-dependent protein dynamics (Figure 2B). In general, changes in the concentration of aaRSs are not necessary for all the corresponding amino acids to be cell cycle dependent because the ATP pool oscillates during the human cell cycle (Orfanoudakis et al, 1987), and because tRNA levels also rise and fall during the cell cycle (Table II; Figure 1). Thus, in steady-state circumstance, the cycling of ATP and aaRSs levels together provides a mechanism to generate oscillating levels of charged tRNAs (aa-tRNAs) synthesized by steady-state levels of aaRSs. Taken together, these observations indicate that the availability of charged tRNAs during a cell cycle may regulate the expression of genes with regard to their codon usage preferences. Codon usage of proteins with cell cycle-dependent protein dynamics To evaluate the translational regulation of proteins that do not cycle at the mRNA, but do cycle at protein levels, we used the protein data set studied previously (Sigal et al, 2006a) but extended with the five additional proteins (Figure 2). Thus, 11 proteins were found to have cycling protein levels but non-cycling mRNA levels (Whitfield et al, 2002; Gauthier et al, 2008, 2010): DDX5, USP7, TOP1, ANP32B, H2AFV, GTF2F2, RBBP7, SFRS10, GARS, TARS, and EPRS, which were determined as cell cycle regulated in means of protein dynamics in human cells. ARGLU1 cycles at the mRNA level and was excluded from that analysis. As a negative set, we used the 11 proteins that were found to not cycle at the protein level despite the mRNA cycling (Whitfield et al, 2002): SAE1, SET, HMGA2, YPEL1, DDX46, LMNA, HMGA1, ZNF433, KIAA1937, GAPDH, and WARS. The cell-cycle codon scores (CCCS) (see Materials and methods) were calculated for all the proteins analyzed (Supplementary Table 9) and consistent with our hypothesis, we found a significant difference between median distributions of the two groups (Wilcoxon's test; P-value<1E−3) (Figure 3). All of the 11 cycling proteins had a positive CCCS, while the non-cycling proteins had both negative and positive scores (Figure 3). Taken together, these observations indicate that the presence of many non-optimal codons in a gene is not sufficient to cause large-amplitude oscillations at the protein level. Figure 3.Comparison of CCCS for proteins with cell cycle-dependent protein dynamics versus proteins with non-cell cycle-dependent protein dynamics. The CCCS evaluates the proportion of wobble codon–anticodon base pairing similar to that of the top-600 genes. A red line represents the distribution mean. Download figure Download PowerPoint Mathematical model To describe how temporal changes in the tRNA pool can lead to the translational regulation mathematically (Figure 4), we concentrated only on two processes: amino-acid charging of tRNAs by aaRSs (producing aminoacyl-tRNAs or 'aa-tRNAs'); and cognate or 'wobble' aa-tRNA binding to mRNAs. The rate of transport of aa-tRNAs species to a ribosomal A site, the intrinsic kinetics of peptidyl transfer, ribosome concentration and their translocation were not considered in this model. Figure 4.A schematic presentation of the additional level of protein translation regulation via the tRNA pool. (A) The translation of poly-TTC and poly-TTT chains (used as an example) when the pool of charged tRNAs includes many TTC-tRNAPhe. (B) Changes in the translation rate of poly-TTC and poly-TTT chains if few TTC-tRNAPhe are available. (C) The oscillating tRNA pool may produce cell cycle-dependent translation of genes, which use wobble codon–anticodon base pairing. The translation rate of proteins using optimal codons stays constant. Download figure Download PowerPoint The aminoacylation reaction is achieved in two steps (Ibba and Söll, 2004).
This file constitutes the text-mining channel on arabidopsis proteins for the last version of the COMPARTMENTS database to use STRING v9 protein identifiers.