SOAP3: GPU-based compressed indexing and ultra-fast parallel alignment of short reads

2011 
As the cost efficiency of the next generation DNA sequencing technology keeps improving, there is an ever-increasing demand for high-throughput software to align the enormous number of short reads (patterns) with reference genomes (such as the human genome). In the past few years, a number of very fast alignment software (e.g., Maq, SOAP2, ZOOM, Bowtie, BWA) have been developed; most of them are exploiting some kind of compressed index (like BWT) of the reference sequence. These tools can align at a speed of a few hundred seconds per one million reads. Yet such speed is still behind the throughput of the next generation sequencing machines. In this paper we show the first implementation of a compressed index on the GPU. The technical issues include how to reduce the memory accesses to the index from individual threads (cores) and how to control the branching and divergence of the threads to avoid unnecessary idle time. Based on this new index, we are able to exploit the parallelism of GPU to speed up the alignment of short reads. Our experiments show that with respect to the human genome, the GPU takes only a few seconds to perform exact alignment of one million length-100 reads, and tens of seconds when a few mismatches are required. Further improvement has been obtained by utilizing the host CPU to share the load of the GPU concurrently.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    17
    Citations
    NaN
    KQI
    []