Impossibility of phylogeny reconstruction from k-mer counts

10/27/2020
by   Wai-Tong Louis Fan, et al.
0

We consider phylogeny estimation under a two-state model of sequence evolution by site substitution on a tree. In the asymptotic regime where the sequence lengths tend to infinity, we show that for any fixed k no statistically consistent phylogeny estimation is possible from k-mer counts of the leaf sequences alone. Formally, we establish that the joint leaf distributions of k-mer counts on two distinct trees have total variation distance bounded away from 1 as the sequence length tends to infinity. That is, the two distributions cannot be distinguished with probability going to one in that asymptotic regime. Our results are information-theoretic: they imply an impossibility result for any reconstruction method using only k-mer counts at the leaves.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro