Constant-time and space-efficient unidirectional and bidirectional FM-indices using EPR-dictionaries.

2016 
We introduce a new method for conducting an exact search in a uni- and bidirectional FM index in $\mathcal{O}(1)$ time per step while using $\mathcal{O}(\log \sigma \cdot n) + o(\log \sigma \cdot \sigma \cdot n)$ bits of space. This is done by replacing the wavelet tree by a new data structure, the \emph{Enhanced Prefixsum Rank dictionary} (EPR-dictionary). To our knowledge this is the first constant time method for a search step in 2FM indices and a space improvement for FM indices. We implemented this method in the SeqAn C++ library and experimentally validated our theoretical results. In addition we compared our implementation with other freely available implementations of bidirectional indices and show that we are between $\approx 2.7-4.6$ times faster. This will have a large impact for many bioinformatics applications that rely on practical implementations of (2)FM indices e.g. for read mapping.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    1
    Citations
    NaN
    KQI
    []