Impossibility of phylogeny reconstruction from $k$-mer counts
2020
We consider phylogeny estimation under a two-state model of sequence evolution by site substitution on a tree. In the asymptotic regime where the sequence lengths tend to infinity, we show that for any fixed $k$ no statistically consistent phylogeny estimation is possible from $k$-mer counts of the leaf sequences alone. Formally, we establish that the joint leaf distributions of $k$-mer counts on two distinct trees have total variation distance bounded away from $1$ as the sequence length tends to infinity. That is, the two distributions cannot be distinguished with probability going to one in that asymptotic regime. Our results are information-theoretic: they imply an impossibility result for any reconstruction method using only $k$-mer counts at the leaves.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
35
References
0
Citations
NaN
KQI