ViroMatch: A Computational Pipeline for the Detection of Viral Sequences from Complex Metagenomic Data
15
Citation
23
Reference
10
Related Paper
Citation Trend
Abstract:
ViroMatch is an automated pipeline that takes metagenomic sequencing reads as input and performs iterative nucleotide and translated nucleotide mapping to identify viral sequences. We provide a Docker image for ViroMatch, so that users will not have to install dependencies.Metagenomics provides a means of assessing the total genetic pool of all the microbes in a particular environment, in a culture-independent manner. It has revealed unprecedented diversity in microbial community composition, which is further reflected in the encoded functional diversity of the genomes, a large proportion of which consists of novel genes. Herein, we review both sequence-based and functional metagenomic methods to uncover novel genes and outline some of the associated problems of each type of approach, as well as potential solutions. Furthermore, we discuss the potential for metagenomic biotherapeutic discovery, with a particular focus on the human gut microbiome and finally, we outline how the discovery of novel genes may be used to create bioengineered probiotics.
Human Microbiome Project
Cite
Citations (113)
Human Microbiome Project
Identification
Cite
Citations (2)
Covering: up to 2021Metagenomics has yielded massive amounts of sequencing data offering a glimpse into the biosynthetic potential of the uncultivated microbial majority. While genome-resolved information about microbial communities from nearly every environment on earth is now available, the ability to accurately predict biocatalytic functions directly from sequencing data remains challenging. Compared to primary metabolic pathways, enzymes involved in secondary metabolism often catalyze specialized reactions with diverse substrates, making these pathways rich resources for the discovery of new enzymology. To date, functional insights gained from studies on environmental DNA (eDNA) have largely relied on PCR- or activity-based screening of eDNA fragments cloned in fosmid or cosmid libraries. As an alternative, shotgun metagenomics holds underexplored potential for the discovery of new enzymes directly from eDNA by avoiding common biases introduced through PCR- or activity-guided functional metagenomics workflows. However, inferring new enzyme functions directly from eDNA is similar to searching for a 'needle in a haystack' without direct links between genotype and phenotype. The goal of this review is to provide a roadmap to navigate shotgun metagenomic sequencing data and identify new candidate biosynthetic enzymes. We cover both computational and experimental strategies to mine metagenomes and explore protein sequence space with a spotlight on natural product biosynthesis. Specifically, we compare
Cite
Citations (115)
Metagenomics has been successfully applied to isolate novel biocatalysts from the uncultured microbiota in the environment. Two types of screening have been used to identify clones carrying desired traits from metagenomic libraries: function-based screening, and sequence-based screening. Both function- and sequence- based screening have individual advantages and disadvantages, and they have been applied successfully to discover biocatalysts from metagenome. However, both strategies are laborious and tedious because of the low frequency of screening hits. A recent paper introduced a high throughput screening strategy, termed substrate-induced gene-expression screening (SIGEX). SIGEX is designed to select the clones harboring catabolic genes induced by various substrates in concert with fluorescence activated cell sorting (FACS). This method was applied successfully to isolate aromatic hydrocarbon-induced genes from a metagenomic library. Although SIGEX has many limitations, it is expected to provide economic advantages, especially to industry.
High-Throughput Screening
Cite
Citations (100)
Abstract Unculturable bacterial communities provide a rich source of biocatalysts, but their experimental discovery by functional metagenomics is difficult, because the odds are stacked against the experimentor. Here we demonstrate functional screening of a million-membered metagenomic library in microfluidic picolitre droplet compartments. Using bait substrates, new hydrolases for sulfate monoesters and phosphotriesters were identified, mostly based on promiscuous activities presumed not to be under selection pressure. Spanning three protein superfamilies, these break new ground in sequence space: promiscuity now connects enzymes with only distantly related sequences. Most hits could not have been predicted by sequence analysis, because the desired activities have never been ascribed to similar sequences, showing how this approach complements bioinformatic harvesting of metagenomic sequencing data. Functional screening of a library of unprecedented size with excellent assay sensitivity has been instrumental in identifying rare genes constituting catalytically versatile hubs in sequence space as potential starting points for the acquisition of new functions.
Sequence space
Sequence (biology)
Chemical space
Cite
Citations (257)
Gut microbiome
Cite
Citations (21)
Cite
Citations (5)
Cite
Citations (335)
Metagenomic studies in diverse environments have generated petabytes of sequencing data, allowing biologists to peer into the uncultivated microbial majority with unprecedented clarity. Such advances have heralded a new era of enzyme discovery where proteins of interest can be directly extracted from metagenomic sequencing data, although this remains a challenging task. Traditionally, metagenomic enzyme discovery, including that involved in xenobiotic degradation, has largely relied on functional insights derived from activity-guided or PCR-based metagenomics. Due to its untargeted and holistic nature, metagenomics allows us to probe the unknown and underexplored microbial diversity, which represents a key resource for novel biocatalyst discovery. Metagenomic shotgun sequencing-based enzyme discovery, additionally, avoids common biases introduced through PCR-based or activity-guided functional genomic methods. In this chapter, we have provided an overview of metagenomics in novel enzyme discovery with discussions on both the experimental and computational aspects of the same. We discuss in detail computational strategies for identifying possible enzyme candidates from shotgun sequencing data and experimental strategies for characterizing candidate enzymes once they have been identified. Finally, we review emerging methods of metagenomic enzyme discovery as well as future goals and challenges with an emphasis on metagenomic-based approaches.
Cite
Citations (1)
Abstract Real-world evaluations of metagenomic reconstructions are challenged by distinguishing reconstruction artefacts from genes and proteins present in situ . Here, we evaluate short-read-only, long-read-only, and hybrid assembly approaches on four different metagenomic samples of varying complexity and demonstrate how they affect gene and protein inference which is particularly relevant for downstream functional analyses. For a human gut microbiome sample, we use complementary metatranscriptomic, and metaproteomic data to evaluate the metagenomic data-based protein predictions. Our findings pave the way for critical assessments of metagenomic reconstructions and we propose a reference-independent solution based on the synergistic effects of multi-omic data integration for the in situ study of microbiomes using long-read sequencing data.
Cite
Citations (1)