The integrated microbial genomes (IMG) system in 2007: data content and analysis tool extensions

2008 
Page 1 of 7 Nucleic Acids Research The integrated microbial genomes (IMG) system in 2007: data content and analysis tool extensions Victor M. Markowitz 1 , Ernest Szeto 1 , Krishna Palaniappan 1 , Yuri Grechkin 1 , Ken Chu 1 , I-Min A. Chen 1 , Inna Dubchak 2 , Iain Anderson 3 , Athanasios Lykidis 3 , Konstantinos Mavromatis 3 , Natalia N. Ivanova 3 and Nikos C. Kyrpides 3 Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, USA, Genomics Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, USA, Genome Biology Program, Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, USA ABSTRACT The Integrated Microbial Genomes (IMG) system is a data management, analysis and annotation platform for all publicly available genomes. IMG contains both draft and complete JGI microbial genomes integrated with all other publicly available genomes from all three domains of life, together with a large number of plasmids and viruses. IMG provides tools and viewers for analyzing and annotating genomes, genes and functions, individually or in a comparative context. Since its first release in 2005, IMG’s data content and analytical capabilities have been constantly expanded through quarterly releases. IMG is provided by the DOE-Joint Genome Institute (JGI) and is available from http://img.jgi.doe.gov. r Fo Pe INTRODUCTION With about 20% of the reported genome projects worldwide, DOE-JGI is one of the main production centers of genome sequence data (1). IMG serves as a community resource for comparative analysis and annotation of all publicly available genomes from all three domains of life, in a uniquely integrated context. Starting with version 2.0 released in December 2006, IMG has employed NCBI’s RefSeq (2) as its main source of publicly available genomes. Through regular updates, IMG’s data content has grown from a total of 296 genomes in its first version released in March 2005, to a total of 2,878 genomes in the version released in September 2007. New archaeal and bacterial genomes are added to IMG on a quarterly basis: IMG 2.3 (Sep 2007) has 729 bacterial and 46 archaeal genomes,. An increasing number of eukaryotic genomes, viruses (including phages) and plasmids have been also added to IMG in order to increase its genomic context for comparative analysis: IMG 2.3 has 50 eukaryotic genomes, 1,661 viruses, and 402 plasmids that did not come from a specific microbial genome sequencing project. IMG’s analytical tools have been gradually generalized and enhanced in terms of their usability, analysis flow, and performance. These tools allow users to focus on a subset of genes, genomes, and functions of interest, and conduct analysis using summary tables, graphical viewers, and various methods for comparing genes, pathways and functions across genomes. er Re vi ew DATA CONTENT AND CURATION Genomes are identified in IMG via their taxonomic lineage (domain, phylum, class, order, family, genus, species, strain). For every genome, IMG incorporates its primary genome sequence information recorded in RefSeq including its organization into scaffolds and/or contigs, together with computationally predicted protein-coding sequences (CDSs) and some RNA-coding genes. IMG employs RefSeq’s gene identifiers to link to other NCBI resources, such as Entrez Gene (3), and in order to establish gene based correlations with other microbial genome systems, such as Microbes Online (4). Functional annotation of genes in IMG involves: (a) protein product assignment, (b) protein family and domain characterization, (c) IMG term assignment and (d) MyIMG protein function assignment. Protein product assignments are available from RefSeq and typically consist of the function prediction provided by sequence genome centres. Protein family and domain characterization involves
    • Correction
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []