Long branch attraction

In phylogenetics, long branch attraction (LBA) is a form of systematic error whereby distantly related lineages are incorrectly inferred to be closely related. LBA arises when the amount of molecular or morphological change accumulated within a lineage is sufficient to cause that lineage to appear similar (thus closely related) to another long-branched lineage, solely because they have both undergone a large amount of change, rather than because they are related by descent. Such bias is more common when the overall divergence of some taxa results in long branches within a phylogeny. Long-branches are often attracted to the base of a phylogenetic tree, because the lineage included to represent an outgroup is often also long-branched. The frequency of true LBA is unclear and often debated, and some authors view it as untestable and therefore irrelevant to empirical phylogenetic inference. Although often viewed as a failing of parsimony-based methodology, LBA could in principle result from a variety of scenarios and be inferred under multiple analytical paradigms.LBA was first recognized as problematic when analyzing discrete morphological character sets under parsimony criteria, however Maximum Likelihood analyses of DNA or protein sequences are also susceptible. A simple hypothetical example can be found in Felsenstein 1978 where it is demonstrated that for certain unknown 'true' trees, some methods can show bias for grouping long branches, ultimately resulting in the inference of a false sister relationship. Often this is because convergent evolution of one or more characters included in the analysis has occurred in multiple taxa. Although they were derived independently, these shared traits can be misinterpreted in the analysis as being shared due to common ancestry.The result of LBA in evolutionary analyses is that rapidly evolving lineages may be inferred to be sister taxa, regardless of their true relationships. For example, in DNA sequence-based analyses, the problem arises when sequences from two (or more) lineages evolve rapidly. There are only four possible nucleotides and when DNA substitution rates are high, the probability that two lineages will evolve the same nucleotide at the same site increases. When this happens, a phylogenetic analysis may erroneously interpret this homoplasy as a synapomorphy (i.e., evolving once in the common ancestor of the two lineages).Assume for simplicity that we are considering a single binary character (it can either be + or -) distributed on the unrooted 'true tree' with branch lengths proportional to amount of character state change, shown in the figure. Because the evolutionary distance from B to D is small, we assume that in the vast majority of all cases, B and D will exhibit the same character state. Here, we will assume that they are both + (+ and - are assigned arbitrarily and swapping them is only a matter of definition). If this is the case, there are four remaining possibilities. A and C can both be +, in which case all taxa are the same and all the trees have the same length. A can be + and C can be -, in which case only one character is different, and we cannot learn anything, as all trees have the same length. Similarly, A can be - and C can be +. The only remaining possibility is that A and C are both -. In this case, however, we view either A and C, or B and D, as a group with respect to the other (one character state is ancestral, the other is derived, and the ancestral state does not define a group). As a consequence, when we have a 'true tree' of this type, the more data we collect (i.e. the more characters we study), the more of them are homoplastic and support the wrong tree. Of course, when dealing with empirical data in phylogenetic studies of actual organisms, we never know the topology of the true tree, and the more parsimonious (AC) or (BD) might well be the correct hypothesis.

Parent Topic

Child Topic

No Parent Topic