DETIRE: A Hybrid Deep Learning Model for identifying Viral Sequences from Metagenomes

2021 
A metagenome contains all DNA sequences from an environmental sample, including viruses, bacteria, fungi, actinomycetes and so on. Since viruses are of huge abundance and have caused vast mortality and morbidity to human society in history as a kind of major pathogens, detecting viruses from metagenomes plays a crucial role in analysing the viral component of samples and is the very first step for clinical diagnosis. However, detecting viral fragments directly from the metagenomes is still a tough issue because of the existence of huge number of short sequences. In this paper, a hybrid Deep lEarning model for idenTifying vIral sequences fRom mEtagenomes (DETIRE), is proposed to solve the problem. Firstly, the graph-based nucleotide sequence embedding strategy is utilized to enrich the expression of DNA sequences by training an embedding matrix. Then the spatial and sequential features are extracted by trained CNN and BiLSTM networks respectively to improve the feature expression of short sequences. Finally, the two set of features are weighted combined for the final decision. Trained by 220,000 sequences of 500bp subsampled from the Virus and Host RefSeq genomes, DETIRE identifies more short viral sequences (<1,000bp) than three latest methods, DeepVirFinder, PPR-Meta and CHEER. DETIRE is freely available at https://github.com/crazyinter/DETIRE.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    39
    References
    0
    Citations
    NaN
    KQI
    []