nf-rnaSeqMetagen: A nextflow metagenomics pipeline for identifying and characterizing microbial sequences from RNA-seq data

2020 
Abstract Metagenomics is a rapidly growing field aimed at identifying and characterizing the microbial genomes within diverse environmental samples. The key research area in metagenomics is the identification of non-host sequences within a host genomic background, which may represent potential microorganisms associated with the host. The aim of this study was to develop an efficient, portable and reproducible metagenomics pipeline for identifying and characterizing microbial reads from high throughput RNA sequencing (RNA-seq) data. The nf-rnaSeqMetagen pipeline presented in this study was developed using Nextflow as a workflow management system to orchestrate applications used in the pipeline and to handle input/output data between processes. All applications were containerized using Singularity to facilitate parallelization, portability and reproducibility. The pipeline takes RNA-seq reads as input and filters out reads belonging to the host organism. The remaining exogenous reads are then characterized using the kraken2 database constructed from bacterial, archaeal, and viral genomes. RNA-seq data from skin samples of patients with the systemic sclerosis (SSc) disease were used to test the pipeline and to identify possible pathogens, so as to better understand the onset and progression of the disease. A number of bacterial species belonging to Arthrobacter, Bacillus, Brachybacterium, Dietzia and Pseudarthrobacter were found to be of clinical relevance and highly common in the SSc patients. nf-rnaSeqMetagen was also extended to work with other metagenomics studies using RNA-seq data and adapted to work on different computational platforms. The nf-rnaSeqMetagen pipeline is freely available on GitHub ( https://github.com/phelelani/nf-rnaSeqMetagen ).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    46
    References
    1
    Citations
    NaN
    KQI
    []