Tag Archives: BEAUti

Help, BEAST acts weird! (or how to set up rates)

23 June 2015 by Remco Bouckaert

“What is going wrong?” is an often asked question. There can be many things going wrong, but there is one thing that goes wrong more often than other things and it easy to fix.

The first thing you want to check is the settings of the rates in BEAUti. There are two places where rates are set:

  • The site model panel, where the substitution rate is set
  • The clock model panel where the clock rate is set

The final rate used is the product of these rates.

The way to think of the substitution rate is that it is a relative rate with respect to other partitions, while the clock rate is the overall rate for substitutions per site per year (or any other unit of time you choose to use). So, substitution rates will be numbers close to 1, while clock rates tend to be small numbers, such as 4e-9 substitutions per site per year.

Substitution rates

To set up the substitution rates, use this chart:

Standard analysis

For an analysis using the Standard template, you can go to the clock model tab and use this chart to set up the clock rate:

* Partitions can be ordered arbitrarily. With the first partition I mean the one for which there are either calibrations, tip dates or a rate from the literature, which usually is the first partition listed in the list of clocks, but may be a later one as well.

** Set the clock rate to “1e-x” where x is a number that is somewhere in the region you expect it for your data helps to get through burn-in faster. You could leave it at the default value of 1.0, but it just takes longer to reach convergence. Assuming you are using years as units of time, workable values are 1e-9 for nuclear data, 1e-6 for mitochondrial, bacterial and DNA viral data and 1e-4 for RNA viral data, but if you have more specific information about your sequences it helps to use it to specify starting value.

*BEAST analysis

*BEAST analysis are a bit different in that tip dates are not allowed (at the time of writing) and calibrations are on the species tree, not the gene tree. Usually, all clock rates but the first are estimated using a broad prior. To decide whether the first rate should be estimated or not, use the chart above.

If BEAST still acts weird after rates are set up correctly, just post a question on the BEAST user list.

Better BEAUti templates

16 June 2015 by Remco Bouckaert

When developing a BEAUti template, you have to keep in mind that a BEAST model is directed acyclic graph of BEAST objects such as for example shown here. A BEAUti template describes a sub-graph that can be slotted into the overall model. This means the template has to define two things:

  1. A set of BEAST objects
  2. A set of rules on how to put the sub-network into the full graph

Up to now, the rules on how to connect networks to the graph was through BeautiConnector rules specifying the srcID of one of the BEAST objects in the sub-network, and a targetID and inputName specifying which object in the larger network to connect to. Furthermore, connections are not always necessary; if a parameter is kept fixed instead of estimated, there is no need to log it, so there is no need to connect that parameter to any logger. A BeautiConnector only connects conditional on whatever is specified in the if attribute.

Below is a simple template that specifies the HKY substitution model. The BEAST objects are specified in the CDATA section: the HKY substitution model and its kappa parameter and frequencies object, two operators and a prior on kappa. If kappa is estimated (the default) kappa should be connected to the state (see first connector rule). Likewise for frequencies (second rule).


    
    
    
	
    



    
	
	
    





]]>




Scale HKY transition-transversion
parameter of partition s:$(n)


Exchange values of frequencies of partition s:$(n)





HKY transition-transversion
parameter of partition s:$(n)


From BEAST v2.3.0, the connector rules can be integrated in the XML fragment that specify the BEAST objects. At the top level, the target objects are specified through their IDs, and anything that need to be connected can be inserted through nesting these objects. The conditions are encoded in the if attribute in the beauti namespace, so they look like beauti:if followed by the condition. Let’s walk through the example above.

First, we specify the subtemplate header, and the HKY object:


    
    
    
	
    

So far nothing different from above. However, next we will connect the kappa and frequency parameter to the state; just define a state element and specify an idref to the state object. The kappa and frequency parameters will be connected to the stateNode input, for which we specify two stateNode elements, and idrefs to the kappa and frequency parameters specified in the HKY block above.


	
	

This replaces the first two connector rules in the original template.

Next, we define the prior on kappa. Since it will be connected to the prior, we wrap it in a distribution element with idref to prior. The condition on which to connect (that kappa is in the likelihood and is estimated) is specified in the prior element. We added name="distribution" in order to ensure the prior is connected to the distribution input.


	
	    
		
		
	    
	

There is another way to specify conditions which is especially handy when there are several items to be connected that are to be connected under the same condition. The if element specified just inside a top-level element in the CDATA block is interpreted as a condition that applies to all of the elements inside. The condition itself is specified in the cond attribute. For example, the operators can be defined like so;

    

    
        
    
    
        
    

That leaves the connections to the loggers to be defined;


    
                

]]>

And that completes the HKY template.

The hope is that this new way for specifying sub-templates is a bit more intuitive once you are used to writing plain BEAST 2 XML. The idea is that going from an example XML to a sub-template just means

  • remove all XML elements outside sub-graph that is not connected to
  • replace all BEAST objects outside the sub-graph with idrefs
  • add conditions (in beauti:if attributes or if elements)

Then the only thing left is to rename IDs so they math partition information and wrap the BEAUti subtemplate bits around the CDATA section.

BEAST package for a tree prior

23 June 2014 by Remco Bouckaert

The birth-death skyline model cite{Stadler} is a tree prior that can deal with serially sampled tip dates. It can be implemented as a lean CalculationNode, which takes a set of five parameters that can be estimated as input, plus an input for the number of intervals in the skyline model. The birth and death rate, and the sampling rate have gamma priors on them, and there are a few functions of the parameters we want to log. The complete model is illustrated in below. The parameters, tree, operators, priors and loggers are already available in BEAST. The loggers use the RPNcalculator class to calculate a simple arithmetic function of the parameters. This only leaves code to be developed for the details of the birth-death skyline model itself.

bdsky

The Birth-Death Skyline model consists of a tree prior, six parameters, a number of operators on these parameters, some priors on these parameters, two specialised loggers, and a set of rules to connect model, e.g., the tree to the prior, the loggers to the trace-log, etc. For clarity, the links between parameters and state are omitted as well as the links between parameters and trace-log.

Step 1: Write BEASTObject – develop code

The code was developed by Denise K”uhnert, and we will have a look at some of the more interesting areas. It starts with a proper description.

@Description("Adaptation of Tanja Stadler's BirthDeathSamplingModel, to allow for birth and death rates to change at times t_i")

The class definition starts defining the five parameter inputs and the input for the number of intervals. Further, there is a boolean input that influences the behaviour.

public class BirthDeathSkylineModel extends SpeciesTreeDistribution {
public Input intervalTimes =
        new Input("intervalTimes", "The times t_i specifying when rate changes can occur", (RealParameter) null);
...
public Input forceRateChange =
        new Input("forceRateChange", "If there is more than one interval and we estimate the time of rate change, do we enforce it to be within the tree interval? Default true", true);

Then follow the member variables for shadowing input parameters and storing intermediate results. This is a thin CalculationNode so there are no member variables for storing/restoring intermediate results.

double t_root;
int m;
protected double[] p0_iMinus1;
...
}

The initAndValidate method sets up member variables and does some sanity checks on the inputs.

@Override
public void initAndValidate() throws Exception {
    super.initAndValidate();

    m = intervalNumber.get().getValue();

    if (intervalTimes.get() != null){
        if (intervalTimes.get().getDimension() != m)
            throw new RuntimeException("Length of intervalTimes parameter should equal to intervalNumber (" + m + ")");
        timesFromXML = true;
    }
}

Then, a few hundreds of lines of code are omitted, which perform the actual calculations. The method of interest for a tree-prior is calculateTreeLogLikelihood for a tree. We refer to the actual code for details, but this is where the most interesting part of the class resides.

@Override
public double calculateTreeLogLikelihood(Tree tree) {
    m = intervalNumber.get().getValue();
	// details omitted ...
    return logP;
}

Finally, the implementation of the CalculationNode is minimal, since this is a thin CalculationNode. The methods store and restore are not even overridden, and just use the base implementation, while requiresRecalculation is the most minimal implementation possible.

@Override
protected boolean requiresRecalculation() {
    return true;
}

Step 2: Test BEASTObject

To test the new BEASTObject, write JUnit tests and create an example XML file, which typically goes in the examples directory. This example file can work as a blue-print for the BEAUti template to a large extend.

Step 3: Write BEAUti template}

We will go through the BEAUti template highlighting some of the aspects. Some of the detail is omitted for brevity and replaced by `…’ . Since a BEAUti template is in BEAST XML format, we start with a BEAST 2 header. For brevity, we introduce


This prior is a tree-prior, and wherever the main template has a merge point for tree priors, this prior should be made available. mergewith is a reserved XML element name, so no spec attribute is necessary.


This is the start of the sub-template definition. Again, there is no spec attribute, but this time it is assumed that the main template has a map-element that maps the subtemplate element name to the BeautiSubTemplate class. Likewise, connect is mapped to BeautiConnector to make things more readable.

beast.app.beauti.BeautiConnector
beast.app.beauti.BeautiSubTemplate

The sub-template has an id, which is used in combo-boxes in BEAUti to identify this model. The class attribute indicate that the sub-template produces a BEASTObject of this type. One task of BEAUti is to create new BEASTObjects, and connect them to an input. The way BEAUti finds potential BEASTObjects to connect an input with is by testing whether the type of an input is compatible with the type of a sub-template. If so, the sub-template is listed in the combo-box for the input in the BEASTObjectInputEditor. The mainid id identifies the BEASTObject that has the type listed in class. Note the $(n) part of the mainid attribute. A sub-template is created in the context of a partition, and wherever there is the string $(n) in the sub-template this is replaced by the name of the partition.


The sub-template starts with a CDATA-section describing the model. The order is not important, but here it was chosen to start with the five parameters that can be estimated for the model, the birth and death rates, interval times, sampling rate and original root height. Then there is an integer parameter that probably should be an integer input for the number of intervals in the skyline model. These parameters are used by the BirthDeathSkylineModel BEASTObject defined below. The BEASTObject has a tree as input, and assumes the main-template provides a tree with id Tree.t:$(n) where $(n) the partition name.



	

  	
	

 

The parameters need a proper prior, defined in the following few lines.


	


We do not only want to log the parameters, but also a few functions of parameters, namely the sampling rate divided by the sum of death and sampling rate. The RPNcalculator is a BEASTObject that can be used to calculate simple arithmetic expressions in reverse Polish notation, which is useful for logging.


	
	        




	 
	 
	 

]]>

To complete the sub-template, we need to define rules for connecting the BEASTObjects defined in the CDATA-section with relevant parts of the main template. BeautiConnector specifies a BEASTObject to connect from through the srcID attribute, a BEASTObject to connect to through the targetID template and a inputName specifies the name of the input in the target BEASTObject to connect with. Finally, the connector is only activated when some conditions are met. If the connection is not met, BEAUti will attempt to disconnect the link (if it exists). The conditions are separated by the ‘and’ keyword in the if attribute. The conditions are typically of the form inposterior(BirthDeathSky.$(n)), which tests whether the BEASTObject with id BirthDeathSky.$(n) is a predecessor of a BEASTObject with id posterior in the model. Further, there are conditions of the form Tree.$(n)/estimate=true, are typically used to test whether an input value of a BEASTObject is a certain value. This is mostly relevant to test whether StateNodes are estimated or not, since if they are not estimated no operator should be defined on it, and logging is not very useful. Let’s have a look at a few connectors.

Connectors are tested in order of appearance. It is always a good idea to make the first connector the one connecting the main BEASTObject in the sub-template, since if this main BEASTObject is disconnected, most of the others should be disconnected as well. For this tree-prior, the tree’s estimate flag can become false when the tree for the partition is linked.


end{lstlisting}}
Next, connect priors on parameters to the prior BEASTObject provided by the main template.
{color{blue}scriptsizebegin{lstlisting}[language=XML,firstnumber=last,breaklines=true]


Connect scale operators, and up-down operator. Note that the up-down operator takes as input the birth, death and sampling rate, but if one of these rates is fixed by the user in BEAUti, the link should be disconnected. So, there are three rules for deciding whether these rates should be connected to the operator.