GPU-accelerated adaptive compression framework for genomics data

2013 
Genomics data is being produced at an unprecedented rate, especially in the context of clinical applications and grand challenge questions. There are various types of data in genomics research, most of which are stored as plain text tables. A data compression framework tailored to this file type is introduced in this paper, featuring a combination of generic compression algorithms, GPU acceleration, and column-major storage. This approach is the first to achieve both compression and decompression rates of around 100MB/s on commodity hardware without compromising compression ratio. By selecting appropriate compression schemes for each column of data, this framework efficiently exploits data redundancy while remaining applicable to a wide range of formats. The GPU-accelerated implementation also properly exploits the parallelism of compression algorithms. Finally, this paper presents a novel first-order Markov model based transformation, with evidence that it is at least as effective as Burrows-Wheeler and Move-To-Front in some contexts.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    7
    Citations
    NaN
    KQI
    []