language-icon Old Web
English
Sign In

Compression of Nanopore FASTQ Files

2019 
The research and development of tools for genomic data compression has focused so far on data generated by second-generation sequencing technologies, while third-generation technologies, such as nanopore technologies, have received little attention in the data compression research community. In this paper, we investigate compression schemes for nanopore FASTQ files. We propose a nanopore quality scores compressor, called DualCtx, which yields significant improvements in compression performance with respect to the state-of-the-art. We also extend DualCtx to a full FASTQ compressor, termed DualFqz, by substituting DualCtx for the quality score compression module in a variant of Fqzcomp. We tested DualFqz and various existing compressors on a large nanopore data set. The results show that DualFqz achieves the best compression performance. The experiments also show that most current implementations of compressors fail to execute correctly on files with long variable length reads.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    3
    Citations
    NaN
    KQI
    []