Supporting data for "MinION nanopore sequencing of environmental metagenomes: a synthetic approach"

2017 
Environmental metagenomic analysis is typically accomplished by assigning taxonomy and/or function from whole genome sequencing (WGS) or 16S amplicon sequences. Both of these approaches are limited by read length and other technical and biological factors. A nanopore-based sequencing platform, MinION™, produces reads that are ≥10000 bp in length, potentially providing for more precise assignment, thereby alleviating some of the limitations inherent in determining metagenome composition from short reads. We tested the ability of sequence data produced by MinION (R7.3 flow cells) to correctly assign taxonomy in single bacterial species runs and in three types of low complexity synthetic communities: a mixture of DNA using equal mass from four species, a community with one relatively rare (1%) and three abundant (33% each) components, and a mixture of genomic DNA from 20 bacterial strains of staggered representation. Taxonomic composition of the low-complexity communities was assessed by analyzing the MinION sequence data with three different bioinformatic approaches: Kraken, MG-RAST, and One Codex. Long read sequences generated from libraries prepared from single strains using the SQK–MAP005 kit and chemistry, run on the original MinION device, yielded as few as 224 to as many as 3,497 bidirectional high-quality (2D) reads with an average overall study length of 6,000 bp. For the single-strain analyses, assignment of reads to the correct genus by different methods ranged from 53.1% to 99.5%, assignment to the correct species ranged from 23.9% to 99.5%, and the majority of mis-assigned reads were to closely related organisms. A synthetic metagenome sequenced with the same setup yielded 714 high quality 2D reads of approximately 5,500 bp that were up to 98% correctly assigned to the species level. Synthetic metagenomes from MinION libraries generated using the SQK–MAP006 kit and chemistry yielded 899-3,497 2D reads with lengths averaging 5,700 bp with up to 98% assignment accuracy at the species-level. The observed community proportions for “equal” and “rare” synthetic libraries were close to the known proportions, deviating from 0.1 – 10% across all tests. For a 20-species mock community with staggered contributions, a sequencing run detected all but 3 species (each included at 99% of reads were assigned to the correct family.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []