On the quartet distance given partial information

11/25/2021
by   Sagi Snir, et al.
0

Let T be an arbitrary phylogenetic tree with n leaves. It is well-known that the average quartet distance between two assignments of taxa to the leaves of T is 2/3n4. However, a longstanding conjecture of Bandelt and Dress asserts that (2/3 +o(1))n4 is also the maximum quartet distance between two assignments. While Alon, Naves, and Sudakov have shown this indeed holds for caterpillar trees, the general case of the conjecture is still unresolved. A natural extension is when partial information is given: the two assignments are known to coincide on a given subset of taxa. The partial information setting is biologically relevant as the location of some taxa (species) in the phylogenetic tree may be known, and for other taxa it might not be known. What can we then say about the average and maximum quartet distance in this more general setting? Surprisingly, even determining the average quartet distance becomes a nontrivial task in the partial information setting and determining the maximum quartet distance is even more challenging, as these turn out to be dependent of the structure of T. In this paper we prove nontrivial asymptotic bounds that are sometimes tight for the average quartet distance in the partial information setting. We also show that the Bandelt and Dress conjecture does not generally hold under the partial information setting. Specifically, we prove that there are cases where the average and maximum quartet distance substantially differ.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset