Exploration of Short Reads Genome Mapping in Hardware

2010 
The newest generation of sequencing instruments, such as Illumina/Solexa Genome Analyzer and ABI SOLiD, can generate hundreds of millions of short DNA “reads” from a single run. These reads must be matched against a reference genome to identify their original location. Due to sequencing errors or variations in the sequenced genome, the matching procedure must allow a variable but limited number of mismatches. This problem is a version of the classic approximate string matching where a long text is searched for the occurrence of a set of short patterns. Typical strategies to speed up the matching involve elaborate hashing schemes that exploit the inherent repetitions of the data. However, such large data structures are not well suited for FPGA implementations. In this paper we evaluate an FPGA implementation that uses a “naive” approach which checks every possible read-genome alignment. We compare the performance of the naive approach to popular software tools currently used to map short reads to a reference genome showing a speedup of up to 4X over the fastest software tool.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    30
    Citations
    NaN
    KQI
    []