All about starting trees

28 July 2014 by Remco Bouckaert

Before starting an MCMC run, BEAST uses State Node initialisers to set up the starting trees (and other parameters). Often a good starting tree is already available from another analysis (e.g. a maximum likelihood tree). Though it is against the spirit of MCMC, which guarantees any random starting state will lead to convergence, a good starting tree can help with large analysis to speed up convergence quite a bit. Of course another reason to specify a starting tree is if you want to do an analysis using just a single known tree topology.

We start with standard analysis, but see below for *BEAST analyses.

Standard analysis

For a standard analysis, there are three ways to specify a starting tree

  • a random tree
  • a tree in Newick format
  • a cluster tree, e.g. through UPGMA or neighbour joining

By default, BEAUti uses a RandomTree state-node initialiser to generate a starting tree. If you want to replace it with a Newick tree or cluster tree, first you need to remove (or comment out) the RandomTree element from the XML. For a tree in partition XYZ26, just search for RandomTree and remove the XML fragment with id RandomTree.t:XYZ26. It should look something like this:


    
        1.0
    

Before removing, make note of the initial attribute (@Tree.t:XYZ26> here) and taxa attribute (@XYZ26 here).

Newick starting tree

After removing the Random tree, in the same place, add a TreeParser using the following fragment


The attributes that you want to specify are

  • id="NewickTree.t:XYZ26" the ID should be unique, so something like NewickTree.t: plus the name of the tree would be suitable.
  • initial="@Tree.t:XYZ26" this refers to the tree being initialised. It should be the same as the initial attribute of the RandomTree that was removed.
  • taxa="@XYZ26" this refers to the alignment and ensures that taxa in the tree are lined with those in the alignment. For BEAST v2.1.x, this must be specified to prevent starting with a mislabelled tree
  • newick="((your,(tree,goes)),here)" obviously needs to be replaced with your own tree in Newick format.

This is all you need to know to set up a starting tree. There are a few optional attributes you can use to make life a bit easier:

  • adjustTipHeights is true by default, which means tips of the tree will be set to zero, or if tip dates are specified to these particular tip dates. By setting to adjustTipHeights="false", tips will be initialised by the heights in the Newick tree.
  • if adjustTipHeights="false" then threshold specifies threshold under which node heights (derived from lengths) are set to zero. This helps when there are numeric issues with adding the lengths.
  • scale scale used to multiply internal node heights during parsing. Useful for importing starting from external programs, for instance, RaxML tree rooted using Path-o-gen.
  • IsLabelledNewick="true" Is the Newick tree labelled (alternatively contains node numbers)?
  • If sLabelledNewick="false" then “offset="1" is the lowest taxa number. The default=1 but 0 is common as well. Taxa numbers are as they are ordered in the alignment referred to with the taxa attribute.

Cluster tree

After removing the Random tree, in the same place, add a ClusterTree with the following fragment


The attributes that you need to specify are id='UPGMATree.t:XYZ26', initial="@Tree.t:XYZ26" and taxa="@XYZ26", which are as for Newick tree (see above).

You can use different clustering algorithms by specifying the clusterType="upgma" attribute.

clusterType Description
single single link
complete complete link
upgma or average UPGMA=average link
mean mean link
centroid centroid
ward Ward’s method
adjcomplete adjusted complete link
neighborjoining neighborjoining
neighborjoining2 neighborjoining2 – corrects tree for tip data, unlike plain neighborjoining

*BEAST analysis

For a *BEAST analysis, the default in BEAUti is to generate a UPGMA tree for the species tree, then generate UPGMA trees for each of the gene trees and fit them inside the species tree. In previous incarnations of BEAST, a random tree was generated for species as well as for gene trees, but gene tree branches ending in leaves were lengthened such that each first coalescence of a gene tree was above the root of the species tree. Though this ensures that the gene trees fit inside the species tree, it leads to long burn-in times.

If you want to use a different starting tree for the species tree, first move (not remove!) the StarBeastStartState element generated by BEAUti to just before the run-element with id="mcmc". BEAUti will have generated some elements inside the start-state (such as the species-tree-prior) that we need elsewhere in the analysis. By moving the init element outside the run element, it will not activate the StarBeastStartState.

The StarBeastStartState looks something like this:


    
    
    

For every gene tree there will be a tree entry in the StarBeastStartState. To replace the species tree with a Newick tree, add the following fragment (at the place where the StarBeastStartState was):


The only thing to set up is the Newick using the newick attribute — all other attributes should be as they are (it is using the species tree, which is identified by the Species partition in every *BEAST analysis generated by BEAUti).

For each of the gene trees, you need to specify a tree as well. You can specify them using a Newick tree as for the Standard analysis (see above). Alternatively, you can use a random tree that is placed ‘above’ the species tree. To do this, you add for the first gene tree (just below the new Species tree start tree):


  

where you replace the initial='@Tree.t:26' and taxa="@26" attributes to match your tree (instead of 26, as in this example). For every subsequent gene tree, you add (below the first gene start tree):


where you replace the initial='@Tree.t:29' and taxa="@29" attributes to match your tree (instead of 29) and make sure the id is unique.

Leave a Reply