The efficient algorithm for mapping next generation sequencing reads to reference genome

Patryk Pankiewicz,Wiktor Kuśmirek,Robert Nowak

The efficient algorithm for mapping next generation sequencing reads to reference genome

2019

Patryk Pankiewicz
Wiktor Kuśmirek
Robert Nowak

One of the main problem related to genomics is finding similarities between different species represented by DNA sequences. The dynamic programming algorithms (Needleman-Wunsch, Smith-Waterman) give a good measure of similarity, but are not efficient for big data sets. In this study we present the new heuristic algorithm based on common parts of reads. The approach can handle all types of sequencing errors: insertions, deletions and replacements. Our algorithm result is similar to other well known tools. The presented algorithm is implemented in C++, it uses Boost libraries, it internally use threads for parallel computing. This algorithm is a part of the DNA assembler ’dnaasm’. Source code, demo application and supplementary materials are available at project homepage: http://dnaasm.sourceforge.net.

Keywords:

Computational biology
DNA sequencing
Reference genome
Computer science
efficient algorithm

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations