The bacterial and archaeal genomes that have been sequenced to date were chosen for sequencing based mainly on their physiology, which is fine but has resulted in a distinct phylogenetic bias. An alternative approach has been taken in the Genomic Encyclopedia of Bacteria and Archaea (GEBA) project, which advocates choosing genomes based on the organism's phylogenetic position, with the aim filling in the gaps in sequencing along on bacterial and archaeal branches of the tree of life. The value of this approach has been demonstrated by a pilot study of the genome sequences of 56 culturable species selected to maximize phylogenetic coverage. Analysis of the sequences provides insights into phylogenetics, protein function and genome annotation. There are now nearly 1,000 completed bacterial and archaeal genomes available, but as most of them were chosen for sequencing on the basis of their physiology, the data are limited by a highly biased phylogenetic distribution. To explore the value added by choosing microbial genomes for sequencing on the basis of their evolutionary relationships, the genomes of 56 species of Bacteria and Archaea selected to maximize phylogenetic coverage are now sequenced and analysed. Sequencing of bacterial and archaeal genomes has revolutionized our understanding of the many roles played by microorganisms1. There are now nearly 1,000 completed bacterial and archaeal genomes available2, most of which were chosen for sequencing on the basis of their physiology. As a result, the perspective provided by the currently available genomes is limited by a highly biased phylogenetic distribution3,4,5. To explore the value added by choosing microbial genomes for sequencing on the basis of their evolutionary relationships, we have sequenced and analysed the genomes of 56 culturable species of Bacteria and Archaea selected to maximize phylogenetic coverage. Analysis of these genomes demonstrated pronounced benefits (compared to an equivalent set of genomes randomly selected from the existing database) in diverse areas including the reconstruction of phylogenetic history, the discovery of new protein families and biological properties, and the prediction of functions for known genes from other organisms. Our results strongly support the need for systematic ‘phylogenomic’ efforts to compile a phylogeny-driven ‘Genomic Encyclopedia of Bacteria and Archaea’ in order to derive maximum knowledge from existing microbial genome data as well as from genome sequences to come.
Recent developments in the understanding of paralogous evolution have prompted a focus not only on obviously advantageous genes, but also on genes that can be considered to have a weak or sporadic impact on the survival of the organism. Here we examine the duplicative behavior of a category of genes that can be considered to be mostly transient in the genome, namely laterally transferred genes. Using both a compositional method and a gene-tree approach, we identify a number of proposed laterally transferred genes and study their nucleotide composition and frequency of duplication. It is found that duplications are significantly overrepresented among potential laterally transferred genes compared to the indigenous ones. Furthermore, the GC3 distribution of potential laterally transferred genes was found to be largely uniform in some genomes, suggesting an import from a broad range of donors. The results are discussed not in a context of strongly optimized established genes, but rather of genes with weak or ancillary functions. The importance of duplication may therefore depend on the variability and availability of weak genes for which novel functions may be discovered. Therefore, lateral transfer may accelerate the evolutionary process of duplication by bringing foreign genes that have mainly weak or no function into the genome.
By comparing two strains of Escherichia coli (K12 and O157:H7) with an outgroup of Salmonella and Klebsiella species and analyzing the sets of genes which are present or absent in either of the three groups, we study the gene history of K12, in particular, since the respective divergences of these bacteria. Furthermore, by using a compositional method based on context bias, we evaluate not only recently imported genes but also deleted genes. In addition, we examine recent gene duplications in the two E. coli strains. It is found that turnover of DNA is high in E. coli and, more importantly, that turnover is highest for genes of low GC content. Although levels of import are high, most of the imported genes seem to be junk or have poorly understood functions. Nevertheless, selected genes do persist, and may even define some E. coli strains as pathogenic. Our results support the conclusion that some of the pathogenic islands in O157:H7 are likely to have been imported in recent time.
<p>PDF file 3292K, Supplementary Figure S1. Genome wide shRNA screen information. Supplementary Figure S2. Validation of ATAD5 as a genetic determinant of olaparib response. Supplementary Figure S3. A working model of PARP1/2 inhibitor-induced DNA repair. Supplementary Figure S4. Inhibition of CDK12 sensitises serous ovarian cancer cells to olaparib and cisplatin. Supplementary Figure S5. CDK12 silencing and the impact on expression of DNA repair proteins. Supplementary Figure S6. Targeting of CCNK sensitises serous ovarian cancer cells to olaparib and cisplatin. Supplementary Figure S7. Animal body weights from in vivo study</p>
The usage of codons and nucleotide combinations varies along genes and systematic variation causes gradients in usage. We have studied such gradients of nucleotides and nucleotide combinations and their immediate context in Escherichia coli. To distinguish mutational and selectional effects, the genes were subdivided into three groups with different codon usage bias and the gradients of nucleotide usage were studied in each group. Some combinations that can be associated with a propensity for processivity errors show strong negative gradients that become weaker in genes with low codon bias, consistent with a selection on translational efficiency. One of the strongest gradients is for third position G, which shows a pervasive positive gradient in usage in most contexts of surrounding bases.
Abstract Background Chromosomal rearrangements in the form of deletions, insertions, inversions and translocations are frequently observed in breast cancer genomes, and a subset of these rearrangements may play a crucial role in tumorigenesis. To identify novel somatic chromosomal rearrangements, we determined the genome structures of 15 hormone-receptor negative breast tumors by long-insert mate pair massively parallel sequencing. Results We identified and validated 40 somatic structural alterations, including the recurring fusion between genes DDX10 and SKA3 and translocations involving the EPHA5 gene. Other rearrangements were found to affect genes in pathways involved in epigenetic regulation, mitosis and signal transduction, underscoring their potential role in breast tumorigenesis. RNA interference-mediated suppression of five candidate genes ( DDX10 , SKA3 , EPHA5 , CLTC and TNIK ) led to inhibition of breast cancer cell growth. Moreover, downregulation of DDX10 in breast cancer cells lead to an increased frequency of apoptotic nuclear morphology. Conclusions Using whole genome mate pair sequencing and RNA interference assays, we have discovered a number of novel gene rearrangements in breast cancer genomes and identified DDX10 , SKA3 , EPHA5 , CLTC and TNIK as potential cancer genes with impact on the growth and proliferation of breast cancer cells.
<p>PDF file 58K, Supplementary Table S4. DNA repair involvement of PARP inhibitor sensitisation genes found in the screen. PMID numbers refer to pubmed entries</p>
Background The extremely halophilic archaea are present worldwide in saline environments and have important biotechnological applications. Ten complete genomes of haloarchaea are now available, providing an opportunity for comparative analysis. Methodology/Principal Findings We report here the comparative analysis of five newly sequenced haloarchaeal genomes with five previously published ones. Whole genome trees based on protein sequences provide strong support for deep relationships between the ten organisms. Using a soft clustering approach, we identified 887 protein clusters present in all halophiles. Of these core clusters, 112 are not found in any other archaea and therefore constitute the haloarchaeal signature. Four of the halophiles were isolated from water, and four were isolated from soil or sediment. Although there are few habitat-specific clusters, the soil/sediment halophiles tend to have greater capacity for polysaccharide degradation, siderophore synthesis, and cell wall modification. Halorhabdus utahensis and Haloterrigena turkmenica encode over forty glycosyl hydrolases each, and may be capable of breaking down naturally occurring complex carbohydrates. H. utahensis is specialized for growth on carbohydrates and has few amino acid degradation pathways. It uses the non-oxidative pentose phosphate pathway instead of the oxidative pathway, giving it more flexibility in the metabolism of pentoses. Conclusions These new genomes expand our understanding of haloarchaeal catabolic pathways, providing a basis for further experimental analysis, especially with regard to carbohydrate metabolism. Halophilic glycosyl hydrolases for use in biofuel production are more likely to be found in halophiles isolated from soil or sediment.
Small-molecule inhibitors of PARP1/2, such as olaparib, have been proposed to serve as a synthetic lethal therapy for cancers that harbor BRCA1 or BRCA2 mutations. Indeed, in clinical trials, PARP1/2 inhibitors elicit sustained antitumor responses in patients with germline BRCA gene mutations. In hypothesizing that additional genetic determinants might direct use of these drugs, we conducted a genome-wide synthetic lethal screen for candidate olaparib sensitivity genes. In support of this hypothesis, the set of identified genes included known determinants of olaparib sensitivity, such as BRCA1, RAD51, and Fanconi's anemia susceptibility genes. In addition, the set included genes implicated in established networks of DNA repair, DNA cohesion, and chromatin remodeling, none of which were known previously to confer sensitivity to PARP1/2 inhibition. Notably, integration of the list of candidate sensitivity genes with data from tumor DNA sequencing studies identified CDK12 deficiency as a clinically relevant biomarker of PARP1/2 inhibitor sensitivity. In models of high-grade serous ovarian cancer (HGS-OVCa), CDK12 attenuation was sufficient to confer sensitivity to PARP1/2 inhibition, suppression of DNA repair via homologous recombination, and reduced expression of BRCA1. As one of only nine genes known to be significantly mutated in HGS-OVCa, CDK12 has properties that should confirm interest in its use as a biomarker, particularly in ongoing clinical trials of PARP1/2 inhibitors and other agents that trigger replication fork arrest.