logo
    Additional file 8: Table S7. of Exploring the rice dispensable genome using a metagenome-like assembly strategy
    0
    Citation
    0
    Reference
    10
    Related Paper
    Abstract:
    The alignment of predicted proteins of the indica dispensable genome to proteins coded by cloned genes of the Nipponbare genome using Blastp. (XLS 33Â kb)
    Keywords:
    Table (database)
    Sequence assembly
    Metagenomics provides a means of assessing the total genetic pool of all the microbes in a particular environment, in a culture-independent manner. It has revealed unprecedented diversity in microbial community composition, which is further reflected in the encoded functional diversity of the genomes, a large proportion of which consists of novel genes. Herein, we review both sequence-based and functional metagenomic methods to uncover novel genes and outline some of the associated problems of each type of approach, as well as potential solutions. Furthermore, we discuss the potential for metagenomic biotherapeutic discovery, with a particular focus on the human gut microbiome and finally, we outline how the discovery of novel genes may be used to create bioengineered probiotics.
    Human Microbiome Project
    Citations (113)
    Metagenomic sequencing of complete microbial communities has greatly enhanced our understanding of the taxonomic composition of microbiotas. This has led to breakthrough developments in bioinformatic disciplines such as assembly, gene clustering, metagenomic binning of species genomes and the discovery of an incredible, so far undiscovered, taxonomic diversity. However, functional annotations and estimating metabolic processes from single species – or communities – is still challenging. Earlier approaches relied mostly on inferring the presence of key enzymes for metabolic pathways in the whole metagenome, ignoring the genomic context of such enzymes, resulting in the 'bag-of-genes' approach to estimate functional capacities of microbiotas. Here, we review recent developments in metagenomic bioinformatics, with a special focus on emerging technologies to simulate and estimate metabolic information, that can be derived from metagenomic assembled genomes. Genome-scale metabolic models can be used to model the emergent properties of microbial consortia and whole communities, and the progress in this area is reviewed. While this subfield of metagenomics is still in its infancy, it is becoming evident that there is a dire need for further bioinformatic tools to address the complex combinatorial problems in modelling the metabolism of large communities as a 'bag-of-genomes'.
    Citations (78)
    Metagenomics has been successfully applied to isolate novel biocatalysts from the uncultured microbiota in the environment. Two types of screening have been used to identify clones carrying desired traits from metagenomic libraries: function-based screening, and sequence-based screening. Both function- and sequence- based screening have individual advantages and disadvantages, and they have been applied successfully to discover biocatalysts from metagenome. However, both strategies are laborious and tedious because of the low frequency of screening hits. A recent paper introduced a high throughput screening strategy, termed substrate-induced gene-expression screening (SIGEX). SIGEX is designed to select the clones harboring catabolic genes induced by various substrates in concert with fluorescence activated cell sorting (FACS). This method was applied successfully to isolate aromatic hydrocarbon-induced genes from a metagenomic library. Although SIGEX has many limitations, it is expected to provide economic advantages, especially to industry.
    High-Throughput Screening
    Citations (100)
    In principle, tandem mass spectrometry can be used to detect and quantify the peptides present in a microbiome sample, enabling functional and taxonomic insight into microbiome metabolic activity. However, the phylogenetic diversity constituting a particular microbiome is often unknown, and many of the organisms present may not have assembled genomes. In ocean microbiome samples, with particularly diverse and uncultured bacterial communities, it is difficult to construct protein databases that contain the bulk of the peptides in the sample without losing detection sensitivity due to the overwhelming number of candidate peptides for each tandem mass spectrum. We describe a method for deriving "metapeptides" (short amino acid sequences that may be represented in multiple organisms) from shotgun metagenomic sequencing of microbiome samples. In two ocean microbiome samples, we constructed site-specific metapeptide databases to detect more than one and a half times as many peptides as by searching against predicted genes from an assembled metagenome and roughly three times as many peptides as by searching against the NCBI environmental proteome database. The increased peptide yield has the potential to enrich the taxonomic and functional characterization of sample metaproteomes.
    Shotgun
    Shotgun proteomics
    Metaproteomics
    Abstract Unculturable bacterial communities provide a rich source of biocatalysts, but their experimental discovery by functional metagenomics is difficult, because the odds are stacked against the experimentor. Here we demonstrate functional screening of a million-membered metagenomic library in microfluidic picolitre droplet compartments. Using bait substrates, new hydrolases for sulfate monoesters and phosphotriesters were identified, mostly based on promiscuous activities presumed not to be under selection pressure. Spanning three protein superfamilies, these break new ground in sequence space: promiscuity now connects enzymes with only distantly related sequences. Most hits could not have been predicted by sequence analysis, because the desired activities have never been ascribed to similar sequences, showing how this approach complements bioinformatic harvesting of metagenomic sequencing data. Functional screening of a library of unprecedented size with excellent assay sensitivity has been instrumental in identifying rare genes constituting catalytically versatile hubs in sequence space as potential starting points for the acquisition of new functions.
    Sequence space
    Sequence (biology)
    Chemical space
    Citations (257)
    Next-generation sequencing technologies permit metagenomic studies to characterize the entire bacterial community within an environment by producing a large amount of short noisy DNA reads. One of the most challenging computational tasks is to assemble millions of short reads into longer contigs, which are used as the basis of subsequent computational analyses. Several de novo assembly methods geared towards single genome have been tuned and applied to metagenomic data set, but very little progress has been made to the comparative assembly for metagenomics. In addition, more and more bacterial genome sequences become available and provide a great opportunity to conduct reference-assisted assembly. In this project, we introduce a computational tool for comparative assembly of metagenomic sequences. Our software first selects reference genomes based on taxonomic profiles estimated from MetaPhyler, and then metagenomic reads are quickly mapped to the reference genomes. When building contigs, we employ a greedy solution of the minimum setcovering problem to produce longer contigs. Furthermore, we propose a hybrid assembly approach, which shows significantly better results than either comparative or de novo assembly does individually. We analyzed two mock and 728 real metagenomic samples from the Human Microbiome Project, and achieved comparable results with the state-of-the-art de novo assemblers. Through our proposed hybrid approach, we assembled 79% of the reads into contigs longer than or equal to 300bp long contigs.
    Sequence assembly
    Bacterial genome size
    Abstract Background Metagenomics is the study of microbial genomes for pathogen detection and discovery in human clinical, animal, and environmental samples via Next-Generation Sequencing (NGS). Metagenome de novo sequence assembly is a crucial analytical step in which longer contigs, ideally whole chromosomes/genomes, are formed from shorter NGS reads. However, the contigs generated from the de novo assembly are often very fragmented and rarely longer than a few kilo base pairs (kb). Therefore, a time-consuming extension process is routinely performed on the de novo assembled contigs. Results To facilitate this process, we propose a new tool for metagenome contig extension after de novo assembly. ContigExtender employs a novel recursive extending strategy that explores multiple extending paths to achieve highly accurate longer contigs. We demonstrate that ContigExtender outperforms existing tools in synthetic, animal, and human metagenomics datasets. Conclusions A novel software tool ContigExtender has been developed to assist and enhance the performance of metagenome de novo assembly. ContigExtender effectively extends contigs from a variety of sources and can be incorporated in most viral metagenomics analysis pipelines for a wide variety of applications, including pathogen detection and viral discovery.
    Sequence assembly
    Citations (16)
    Metagenomic studies in diverse environments have generated petabytes of sequencing data, allowing biologists to peer into the uncultivated microbial majority with unprecedented clarity. Such advances have heralded a new era of enzyme discovery where proteins of interest can be directly extracted from metagenomic sequencing data, although this remains a challenging task. Traditionally, metagenomic enzyme discovery, including that involved in xenobiotic degradation, has largely relied on functional insights derived from activity-guided or PCR-based metagenomics. Due to its untargeted and holistic nature, metagenomics allows us to probe the unknown and underexplored microbial diversity, which represents a key resource for novel biocatalyst discovery. Metagenomic shotgun sequencing-based enzyme discovery, additionally, avoids common biases introduced through PCR-based or activity-guided functional genomic methods. In this chapter, we have provided an overview of metagenomics in novel enzyme discovery with discussions on both the experimental and computational aspects of the same. We discuss in detail computational strategies for identifying possible enzyme candidates from shotgun sequencing data and experimental strategies for characterizing candidate enzymes once they have been identified. Finally, we review emerging methods of metagenomic enzyme discovery as well as future goals and challenges with an emphasis on metagenomic-based approaches.