Local decoding of sequences and alignment-free comparison.

2006 
Subword composition plays an important role in a lot of analyses of sequences. Here we define and study the "local decoding of order N of sequences," an alternative that avoids some drawbacks of "subwords of length N" approaches while keeping informations about environments of length N in the sequences ("decoding" is taken here in the sense of hidden Markov modeling, i.e., associating some state to all positions of the sequence). We present an algorithm for computing the local decoding of order N of a given set of sequences. Its complexity is linear in the total length of the set (whatever the order N) both in time and memory space. In order to show a use of local decoding, we propose a very basic dissimilarity measure between sequences which can be computed both from local decoding of order N and composition in subwords of length N. The accuracies of these two dissimilarities are evaluated, over several datasets, by computing their linear correlations with a reference alignment-based distance. These accu...
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    13
    Citations
    NaN
    KQI
    []