Comprehensive benchmarking of tools to identify phages in metagenomic shotgun sequencing data

2021 
Background: As the relevance of bacteriophages in shaping diversity in microbial ecosystems is becoming increasingly clear, the prediction of phage sequences in metagenomic datasets has become a topic of considerable interest, which has led to the development of many novel bioinformatic tools. A comprehensive comparative analysis of these tools has so far not been performed. Methods: We benchmarked ten state-of-the-art phage identification tools. We used artificial contigs generated from complete RefSeq genomes representing phages, plasmids, and chromosomes, and a previously sequenced mock community containing four phage strains to evaluate the precision, recall and F1-scores of the tools. In addition, a set of previously simulated viromes was used to assess diversity bias in each tool's output. Results: DeepVirFinder performed best across the datasets of artificial contigs and the mock community, with the highest F1-scores (0.98 and 0.61 respectively). Generally, machine learning-based tools performed better on the artificial contigs, while reference and machine learning based tool performed comparably on the mock community. Most tools produced a viral genome set that had similar alpha and beta diversity patterns to the original population with the notable exception of Seeker, whose metrics differed significantly from the diversity of the underlying data. Conclusions: This study provides key metrics used to assess performance of phage detection tools, offers a framework for further comparison of additional viral discovery tools, and discusses optimal strategies for using these tools.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    65
    References
    2
    Citations
    NaN
    KQI
    []