A performant bridge between fixed-size and variable-size seeding

2019 
Seeding is usually the initial step of high-throughput sequence aligners. Two popular seeding strategies are fixed-size seeding (k-mers, minimizers) and variable-size seeding (MEMs, SMEMs, max. spanning seeds). The former strategy benefits from fast index building and fast seed computation, while the latter one benefits from high seed entropy. Here we build a performant bridge between both strategies and show that neither of them is of theoretical superiority. We propose an algorithmic approach for computing MEMs out of k-mers or minimizers. Further, we describe techniques for extracting SMEMs or maximally spanning seeds out of MEMs. A comprehensive benchmarking shows the practical value of the proposed approaches. In this context, we report about the effects and the fine-tuning of occurrence filters for the different seeding strategies.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    0
    Citations
    NaN
    KQI
    []