Leveraging Microbial Genomes and Genomic Context for Chemical Discovery.

Duncan J Kountz,Emily P. Balskus

Leveraging Microbial Genomes and Genomic Context for Chemical Discovery.

2021

Microbial biochemistry has been studied for well over 100 years, and this field has revealed microbes, with their incredibly varied lifestyles, habitats, and metabolisms, to be a rich source of new chemistry. The discovery and characterization of microbial enzymes, metabolic pathways, and metabolites have influenced many branches of chemistry, informing the study of chemical reactivity,5 providing targets for total and semisynthesis,6 fueling drug development and medicinal chemistry,7 and revealing new tools for biocatalysis and biotechnology.8 Though well-established, microbial biochemistry has been transforming in the genomic era. This period (which begins roughly in the mid 2000s and proceeds through the present) encompasses a time during which genomic sequencing has been widely available and, in recent years, ubiquitous. As of November 2020, over 229,000 permanent draft genomes and 20,100 complete genomes have been deposited in NCBI and are listed in the Genomes Online Database (GOLD). The majority of these (62%) are from bacteria and archaea.9 This bounty is accompanied by a formidable gap between the number of genes sequenced and the number for which we can confidently assign a function. Even in E. coli, there is experimental evidence for the function for only about two-thirds of the predicted genes.10 In less well characterized organisms, that number is much lower. For instance, as part of the expanded Human Microbiome Project, researchers could only functionally annotate 35–45% of the genes identified.11 This challenge extends to genes that are critical for life. Of the 473 apparently essential genes in a “minimal bacterial genome” derived from the genome of Mycoplasma mycoides, 31.5% had no known function at the time of publication.12 Clearly, the genomic era has provided vast amounts of data that have the potential to transform our understanding of microbes and their chemistry. The major challenge we face in our current research era is how to best harness this information for discovery. Early biochemical discoveries stemmed from a common investigative approach, regardless of whether the investigator studied microbes or mammals. One usually found a compound or enzyme of interest by obtaining biomass, lysing the cells (or collecting the spent media), and then purifying the active species of interest. When genetic mapping and manipulation became available, the investigator could use purified protein to track down the encoding gene and perhaps disrupt or delete it. Overall, the process was a “forward” one that began with biomass and ended with a gene. In the genomic era, we can more frequently use “reverse” approaches that start with genes and end with biochemical functions. In these endeavors, microbial biochemists can also exploit a powerful genomic trait. Prokaryotes (bacteria and archaea), and sometimes fungi, organize their genomic information into gene clusters (defined below). Researchers can exploit this feature for biochemical discovery through the analysis of genomic context, providing opportunities that are largely unavailable for the study of macroscopic organisms, which typically do not utilize gene clusters. This Account discusses how the analysis of gene clusters, operons, and genomic context can enable many facets of microbial biochemical research. In particular, we highlight how work in our laboratory has combined a chemical understanding of enzymes and metabolism with genomic context to solve difficult problems and simplify scientific challenges. The continued development and refinement of computational tools that exploit this feature will greatly accelerate efforts to connect genes with new biochemical functions. What Is a Gene Cluster and Why Do Microorganisms Use Them? A gene cluster consists of a syntenic set of genes, their intervening noncoding sequences, and adjacent regulatory elements. These genes are typically functionally related (involved in the same pathway or process) and may be positively or negatively coregulated. Prokaryotes are known to cotranscribe sets of gene that are oriented in the same direction, generating a single molecule of mRNA containing multiple open reading frames (ORFs). An operon is a stretch of genomic DNA that serves as a transcriptional template for a multi-ORF mRNA, and the simplest gene cluster consists of a single operon. Here, all of the genes in the operon are physiologically tied together by their cotranscription. However, gene clusters are often larger and more complex than single operons. For example, the choline utilization (cut) gene cluster, which allows microbes to use choline as a source of carbon and energy under anaerobic conditions, contains multiple apparent operons, all of which are transcribed in the same direction.4 It is also quite common to observe gene clusters consisting of two operons that are arranged “head-to-head”, divergently transcribed from the same DNA segment. An example of this type of gene cluster is the D. desulfuricans DSM 642 isethionate metabolism gene cluster.13 Regulatory proteins are also often found oriented for divergent transcription just upstream of an operon.14 Finally, some gene clusters are chaotically organized, containing functionally related genes in multiple operons that are divergently transcribed with no apparent pattern. One example is the cre gene cluster that biosynthesizes cremeomycin in Streptomyces cremeus (Figure Figure11B).15 Such gene clusters are particularly difficult to identify. It is therefore important to keep in mind the many structural varieties of gene clusters. Open in a separate window Figure 1 Structure-guided approaches for biosynthetic gene cluster (BGC) identification. (A) Discovery of the alanosine BGC. (B) Discovery of the cremeomycin BGC. PLP = pyridoxal phosphate; NAD+ = nicotinamide adenine dinucleotide; 3,4-AHBA = 3-amino-4-hydroxybenzoic acid; DHAP = dihydroxyacetone phosphate.

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations