Microbial communities have a profound impact on both human health and various environments. Viruses infecting bacteria, known as bacteriophages or phages, play a key role in modulating bacterial communities within environments. High-quality phage genome sequences are essential for advancing our understanding of phage biology, enabling comparative genomics studies and developing phage-based diagnostic tools. Most available viral identification tools consider individual sequences to determine whether they are of viral origin. As a result of challenges in viral assembly, fragmentation of genomes can occur, and existing tools may recover incomplete genome fragments. Therefore, the identification and characterization of novel phage genomes remain a challenge, leading to the need of improved approaches for phage genome recovery.
The microbiome is an essential part of most ecosystems. It was originally studied mostly through culturing but relatively few microbes can be cultured, so much of the microbiome was left unexplored. The emergence of metagenomic sequencing techniques changed that and allowed the study of microbiomes from all sorts of habitats. Metagenomic sequencing also allowed for a more thorough exploration of prophages, viruses that integrate into bacterial genomes, and how they benefit their hosts. One issue with using open-access metagenomic data is that sequences added to databases often have little to no metadata to work with, so finding enough sequences can be difficult. Many metagenomes have been manually curated but this is a time-consuming process and relies heavily on the uploader to be accurate and thorough when filling in metadata fields and the curators to be working with the same ontologies. Using algorithms to automatically sort metagenomes based on either the taxonomic profile or the functional profile may be a viable solution to the issues with manually curated metagenomes, but it requires that the algorithm is trained on carefully curated datasets and using the most informative profile possible in order to minimize errors.
Phages integrated into a bacterial genome – called prophages – continuously monitor the vigour of the host bacteria to determine when to escape the genome and to protect their host from other phage infections, and they may provide genes that promote bacterial growth. Prophages are essential to almost all microbiomes, including the human microbiome. However, most human microbiome studies have focused on bacteria, ignoring free and integrated phages, so we know little about how these prophages affect the human microbiome. To address this gap in our knowledge, we compared the prophages identified in 14 987 bacterial genomes isolated from human body sites to characterize prophage DNA in the human microbiome. Here, we show that prophage DNA is ubiquitous, comprising on average 1–5 % of each bacterial genome. The prophage content per genome varies with the isolation site on the human body, the health of the human and whether the disease was symptomatic. The presence of prophages promotes bacterial growth and sculpts the microbiome. However, the disparities caused by prophages vary throughout the body.
Abstract Motivation Phage therapy is a viable alternative for treating bacterial infections amidst the escalating threat of antimicrobial resistance. However, the therapeutic success of phage therapy depends on selecting safe and effective phage candidates. While experimental methods focus on isolating phages and determining their lifecycle and host range, comprehensive genomic screening is critical to identify markers that indicate potential risks, such as toxins, antimicrobial resistance, or temperate lifecycle traits. These analyses are often labor-intensive and time-consuming, limiting the rapid deployment of phage in clinical settings. Results We developed Sphae, an automated bioinformatics pipeline designed to streamline therapeutic potential of a phage in under ten minutes. Using Snakemake workflow manager, Sphae integrates tools for quality control, assembly, genome assessment, and annotation tailored specifically for phage biology. Sphae automates the detection of key genomic markers, including virulence factors, antimicrobial resistance genes, and lysogeny indicators like integrase, recombinase, and transposase, which could preclude therapeutic use. Benchmarked on 65 phage sequences, 28 phage samples showed therapeutic potential, 8 failed during assembly due to low sequencing depth, 22 samples included prophage or virulent markers, and the remaining 23 samples included multiple phage genomes per sample. This workflow outputs a comprehensive report, enabling rapid assessment of phage safety and suitability for phage therapy under these criteria. Sphae is scalable, portable, facilitating efficient deployment across most high-performance computing (HPC) and cloud platforms, expediting the genomic evaluation process. Availability Sphae is source code and freely available at https://github.com/linsalrob/sphae , with installation supported on Conda, PyPi, Docker containers.
Abstract Motivation Phage therapy offers a viable alternative for bacterial infections amid rising antimicrobial resistance. Its success relies on selecting safe and effective phage candidates that require comprehensive genomic screening to identify potential risks. However, this process is often labor intensive and time-consuming, hindering rapid clinical deployment. Results We developed Sphae, an automated bioinformatics pipeline designed to streamline the therapeutic potential of a phage in under 10 minutes. Using Snakemake workflow manager, Sphae integrates tools for quality control, assembly, genome assessment, and annotation tailored specifically for phage biology. Sphae automates the detection of key genomic markers, including virulence factors, antimicrobial resistance genes, and lysogeny indicators such as integrase, recombinase, and transposase, which could preclude therapeutic use. Among the 65 phage sequences analyzed, 28 showed therapeutic potential, 8 failed due to low sequencing depth, 22 contained prophage or virulent markers, and 23 had multiple phage genomes. This workflow produces a report to assess phage safety and therapy suitability quickly. Sphae is scalable and portable, facilitating efficient deployment across most high-performance computing and cloud platforms, accelerating the genomic evaluation process. Availability and implementation Sphae source code is freely available at https://github.com/linsalrob/sphae, with installation supported on Conda, PyPi, Docker containers.
Abstract Most bacterial genomes contain integrated bacteriophages—prophages—in various states of decay. Many are active and able to excise from the genome and replicate, while others are cryptic prophages, remnants of their former selves. Over the last two decades, many computational tools have been developed to identify the prophage components of bacterial genomes, and it is a particularly active area for the application of machine learning approaches. However, progress is hindered and comparisons thwarted because there are no manually curated bacterial genomes that can be used to test new prophage prediction algorithms. Here, we present a library of gold-standard bacterial genome annotations that include manually curated prophage annotations, and a computational framework to compare the predictions from different algorithms. We use this suite to compare all extant stand-alone prophage prediction algorithms to identify their strengths and weaknesses. We provide a FAIR dataset for prophage identification, and demonstrate the accuracy, precision, recall, and f 1 score from the analysis of seven different algorithms for the prediction of prophages. We discuss caveats and concerns in this analysis and how those concerns may be mitigated.
Abstract Phages integrated into a bacterial genome–called prophages–continuously monitor the health of the host bacteria to determine when to escape the genome, protect their host from other phage infections, and may provide genes that promote bacterial growth. Prophages are essential to almost all microbiomes, including the human microbiome. However, most human microbiome studies focus on bacteria, ignoring free and integrated phages, so we know little about how these prophages affect the human microbiome. We compared the prophages identified in 11,513 bacterial genomes isolated from human body sites to characterise prophage DNA in the human microbiome. Here, we show that prophage DNA comprised an average of 1-5% of each bacterial genome. The prophage content per genome varies with the isolation site on the human body, the health of the human, and whether the disease was symptomatic. The presence of prophages promotes bacterial growth and sculpts the microbiome. However, the disparities caused by prophages vary throughout the body.
Background Most bacterial genomes contain integrated bacteriophages—prophages—in various states of decay. Many are active and able to excise from the genome and replicate, while others are cryptic prophages, remnants of their former selves. Over the last two decades, many computational tools have been developed to identify the prophage components of bacterial genomes, and it is a particularly active area for the application of machine learning approaches. However, progress is hindered and comparisons thwarted because there are no manually curated bacterial genomes that can be used to test new prophage prediction algorithms. Methods We present a library of gold-standard bacterial genome annotations that include manually curated prophage annotations, and a computational framework to compare the predictions from different algorithms. We use this suite to compare all extant stand-alone prophage prediction algorithms to identify their strengths and weaknesses. We provide a FAIR dataset for prophage identification, and demonstrate the accuracy, precision, recall, and f1 score from the analysis of seven different algorithms for the prediction of prophages. Results We identified different strengths and weaknesses between the prophage prediction tools. Several tools exhibit exceptional f1 scores, while others have better recall at the expense of more false positives. The tools vary greatly in runtime performance with few exhibiting all desirable qualities for large-scale analyses. Conclusions Our library of gold-standard prophage annotations and benchmarking framework provide a valuable resource for exploring strengths and weaknesses of current and future prophage annotation tools. We discuss caveats and concerns in this analysis, how those concerns may be mitigated, and avenues for future improvements. This framework will help developers identify opportunities for improvement and test updates. It will also help users in determining the tools that are best suited for their analysis.
Species identification following shark-related incidents is critical for effective incident management and for collecting data to inform shark bite mitigation strategies. Witness statements are not always reliable, and identification is often ambiguous or missing. Alternative methods for species identification include morphological assessments of bite marks, analysis of collected teeth at the scene of the incident, and genetic approaches. However, access to appropriate collection media and robust genetic assays have limited the use of genetic technologies. Here, we present a case study that facilitated a unique opportunity for an experimental design to compare the effectiveness of medical gauze readily available in first-aid kits, and forensic-grade swabs in collecting genetic material for shark-species identification. Sterile medical gauze and forensic-grade swabs were used to collect transfer DNA from the bite margins on a bitten surf ski which were compared to a piece of shark tissue found embedded along the bite margin. Witness accounts and the characteristics of the bite mark impressions inferred the involvement of a Carcharodon carcharias (white shark). The morphology of a tooth found in the surf ski, however, suggested it belonged to an Orectolobus spp. (wobbegong). Genetic analysis of DNA transferred from the shark to the surf ski included the application of a broad-target nested PCR assay followed by Sanger Sequencing, with white shark contribution to the 'total sample DNA' determined with a species-specific qPCR assay. The results of the genetic analyses were congruent between sampling methods with respect to species identification and the level of activity inferred by the donor-specific DNA contribution. These data also supported the inferences drawn from the bite mark morphology. DNA from the recovered tooth was PCR amplified with a wobbegong-specific primer pair designed for this study to corroborate the tooth's morphological identification. Following the validation of gauze used for sampling in the case study event, two additional isolated incidents occurred and were sampled in situ using gauze, as typically found in a first-aid kit, by external personnel. DNA extracted from these gauze samples resulted in the identification of a white shark as the donor of the DNA collected from the bite marks in both instances. This study, involving three incidents separated by time and location, represents the seminal application of gauze as a sampling media after critical human-shark interactions and strongly supports the practical implementation of these methods in the field.
Background Most bacterial genomes contain integrated bacteriophages—prophages—in various states of decay. Many are active and able to excise from the genome and replicate, while others are cryptic prophages, remnants of their former selves. Over the last two decades, many computational tools have been developed to identify the prophage components of bacterial genomes, and it is a particularly active area for the application of machine learning approaches. However, progress is hindered and comparisons thwarted because there are no manually curated bacterial genomes that can be used to test new prophage prediction algorithms. Methods We present a library of gold-standard bacterial genomes with manually curated prophage annotations, and a computational framework to compare the predictions from different algorithms. We use this suite to compare all extant stand-alone prophage prediction algorithms and identify their strengths and weaknesses. We provide a FAIR dataset for prophage identification, and demonstrate the accuracy, precision, recall, and f 1 score from the analysis of ten different algorithms for the prediction of prophages. Results We identified strengths and weaknesses between the prophage prediction tools. Several tools exhibit exceptional f 1 scores, while others have better recall at the expense of more false positives. The tools vary greatly in runtime performance with few exhibiting all desirable qualities for large-scale analyses. Conclusions Our library of gold-standard prophage annotations and benchmarking framework provide a valuable resource for exploring strengths and weaknesses of current and future prophage annotation tools. We discuss caveats and concerns in this analysis, how those concerns may be mitigated, and avenues for future improvements. This framework will help developers identify opportunities for improvement and test updates. It will also help users in determining the tools that are best suited for their analysis.