Metagenome SNP Calling via Read Colored de Bruijn Graphs

2020 
MOTIVATION: Metagenomics refers to the study of complex samples containing of genetic contents of multiple individual organisms, and thus, has been used to elucidate the microbiome and resistome of a complex sample. The microbiome refers to all microbial organisms in a sample, and the resistome refers to all of the antimicrobial resistance (AMR) genes in pathogenic and non-pathogenic bacteria. Single nucleotide polymorphisms (SNPs) can be effectively used to "fingerprint" specific organisms and genes within the microbiome and resistome, and trace their movement accross various samples. However, in order to effectively use these SNPs for this traceability, a scalable and accurate metagenomics SNP caller is needed. Moreover, such a SNP caller should not be reliant on reference genomes since 95% of microbial species is unculturable, making the determination of a reference genome extremely challenging. In this paper, we address this need. RESULTS: We present LueVari, a reference-free SNP caller based on the read colored de Bruijn graph, an extension of the traditional de Bruijn graph that allows repeated regions longer than the k-mer length and shorter than the read length to be identified unambiguously. LueVari is able to identify SNPs in both AMR genes and chromosomal DNA from shotgun metagenomics data with reliable sensitivity (between 91% to 99%) and precision (between 71% to 99%) as the performance of competing methods varies widely. Furthermore, we show that LueVari constructs sequences containing the variation which span up to 97.8% of genes in datasets which can be helpful in detecting distinct AMR genes in large metagenomic datasets. AVAILABILITY: Code and datasets are publicly available at https://github.com/baharpan/cosmo/tree/LueVari.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    48
    References
    3
    Citations
    NaN
    KQI
    []