Taxonomy-aware, sequence similarity ranking reliably predicts phage-host relationships

2021 
Motivation: Similar regions in virus and host genomes provide strong evidence for phage-host interaction, and BLAST is one of the leading tools to predict hosts from phage sequences. However, BLAST-based host prediction has three limitations: (i) top-scoring prokaryotic sequences do not always point to the actual host, (ii) mosaic phage genomes may produce matches to many, typically related, bacteria, and (iii) phage and host sequences may diverge beyond the point where their relationship can be detected by a BLAST alignment. Results: We created an extension to BLAST, named Phirbo, that improves host prediction quality beyond what is obtainable from standard BLAST searches. The tool harnesses information concerning sequence similarity and bacteria relatedness to predict phage-host interactions. Phirbo was evaluated on two benchmark sets of known phage-host pairs, and it improved precision and recall by 25 percentage points, as well as the discriminatory power for the recognition of phage-host relationships by 10 percentage points (Area Under the Curve = 0.95). Phirbo also yielded a mean host prediction accuracy of 60% and 70% at the genus and family levels, respectively, representing a 5% improvement over BLAST. When using only a fraction of phage genome sequences (3 kb), the prediction accuracy of Phirbo was 5-11% higher than BLAST at all taxonomic levels. Conclusion: Our results suggest that Phirbo is an effective, unsupervised tool for predicting phage-host relationships. Availability: Phirbo is available at https://github.com/aziele/phirbo
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    49
    References
    0
    Citations
    NaN
    KQI
    []