Faster algorithms for 1-mappability of a sequence

2020 
In the k-mappability problem, we are given a string x of length n and integers m and k, and we are asked to count, for each length-m factor y of x, the number of other factors of length m of x that are at Hamming distance at most k from y. We focus here on the version of the problem where \(k=1\). The fastest known algorithm for \(k=1\) requires time \(\mathcal {O}(mn \log n/\log \log n)\) and space \(\mathcal {O}(n)\). We present two new algorithms that require worst-case time \(\mathcal {O}(mn)\) and \(\mathcal {O}(n \log n \log \log n)\), respectively, and space \(\mathcal {O}(n)\), thus greatly improving the state of the art. Moreover, we present another algorithm that requires average-case time and space \(\mathcal {O}(n)\) for integer alphabets of size \(\sigma \) if \(m=\varOmega (\log _\sigma n)\). Notably, we show that this algorithm is generalizable for arbitrary k, requiring average-case time \(\mathcal {O}(kn)\) and space \(\mathcal {O}(n)\) if \(m=\varOmega (k\log _\sigma n)\).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    1
    Citations
    NaN
    KQI
    []