Let (Formula presented.) be an arbitrary phylogenetic tree with (Formula presented.) leaves. It is well known that the average quartet distance between two assignments of taxa to the leaves of (Formula presented.) is (Formula presented.). However, a longstanding conjecture of Bandelt and Dress asserts that (Formula presented.) is also the maximum quartet distance between two assignments. While Alon, Naves, and Sudakov have shown this indeed holds for caterpillar trees, the general case of the conjecture is still unresolved. A natural extension is when partial information is given: the two assignments are known to coincide on a given subset of taxa. The partial information setting is biologically relevant as the location of some taxa (species) in the phylogenetic tree may be known, and for other taxa it might not be known. What can we then say about the average and maximum quartet distance in this more general setting? Surprisingly, even determining the average quartet distance becomes a nontrivial task in the partial information setting and determining the maximum quartet distance is even more challenging, as these turn out to be dependent on the structure of (Formula presented.). In this paper we prove nontrivial asymptotic bounds that are sometimes tight for the average quartet distance in the partial information setting. We also show that the Bandelt and Dress conjecture does not generally hold under the partial information setting. Specifically, we prove that there are cases where the average and maximum quartet distance substantially differ.
Bibliographical noteFunding Information:
The author's research is supported in part by ISF grant 1082/16.
© 2021 Wiley Periodicals LLC
- phylogenetic tree
ASJC Scopus subject areas
- Geometry and Topology
- Discrete Mathematics and Combinatorics