Performance Suggestions

The majority of computing time is spent in calculating the treelikelihood, so we will concentrate on tips on how to speed up this calculation. There are no hard and fast rules, and you have to try a few combinations to see what works best for your data.

In general, when there are many patterns and/or a large number of states, BEAGLE GPU works best. Often BEAGLE SSE gives good performance on medium data sets, or when there are few patterns. For nucleotide data, ThreadedTreeLikelihood can be helpful.

BEAGLE

BEAGLE is a high performance library that efficiently calculates treelikelihoods on CPU and GPUs. Note: a GPU is not necessary, BEAGLE can give considerable performance improvements on CPU as well. See http://beast.bio.ed.ac.uk/BEAGLE for more details. How it is installed and used with BEAST depends on the platform: http://code.google.com/p/beagle-lib/w/list
By default, BEAST tries to use BEAGLE for treelikelihood calculations, if it is installed. However, this does not always leads to performance improvements, especially with nucleotide data where there are few patterns. You can start BEAST with the -java command line flag to ensure BEAGLE is not considered. Some likelihoods that have a ‘useJava’ flag (see below) to ensure only Java is considered.

It is worth trying the -beagle_SSE option, which uses a CPU version optimised for the SSE instruction set, which most CPUs support.

BEAGLE + BEAST 2 in cluster

ThreadedTreelikelihood

(In versions before v2.4.0, the BEASTlabs-package — see Manage packages on how to install — contained the ThreadedTreeLikelihood). The ThreadedTreeLikelihood has a treelikelihood that splits up the patterns into equal parts and uses a thread for each of the parts.

The number of parts is determined by the number of threads (and can be specified using the ‘threads’ attribute). To use the ThreadedTreelikelihood, open the XML file in a text editor and change

   spec="TreeLikelihood" 

to

   spec="ThreadedTreeLikelihood".

There is a flag ‘useJava’ to indicate the calculation should use the Java treelikelihood, and not consider BEAGLE. (When using BEAST versions before v2.4.0, to use BEAGLE, set useJava=’false’

   spec="ThreadedTreeLikelihood" useJava='false'

)
If you want to limit the number of threads used for splitting the patterns, use the threads attribute. For example, to limit the number of partitions to 3, use

   spec="ThreadedTreeLikelihood" threads='3'

To run multiple BEAGLE instances with ThreadedTreeLikelihood”, the number of threads used to start BEAST must be at least the number of BEAGLE instances, so you would want to start BEAST using

   beast -instances 3 -threads 3 xyz.xml

if you want to use 3 BEAGLE instances (for versions before v2.4.0 use the -beagle_instances flag instead of -instances). Using only the ‘instances’ flag but not the ‘threads’ flag results in just 1 thread being created.

AncestralTreelikelihood

The beast-classic add-on (see Manage packages on how to install) has a Treelikelihood used for discrete phylogeography and ancestral reconstruction called AncestralTreelikelihood. Since it typically only uses a single site, threading does not help.

There is a flag ‘useJava’ to indicate the calculation should not consider BEAGLE.

Multiple partitions: CompoundDistribution

If you have multiple partitions, you can consider the useThreads flag of CompoundDistribution, which is false by default. If set to true, all distributions inside the CompoundDistribution will be calculated in parallel using the number of threads used to run BEAST.

        

 	 	 
 
   

If the treelikelihoods share parameters, e.g through a relaxed clock model, this may not always be safe.

Particle Filter

For large analysis, getting through burn-in is a considerable waste of time. When a good starting point can be found burn-in can be reduced, and particle filter approach allows for finding a good starting point relatively efficiently.

For the adventurous, there is the beast.inference.ParticleLauncherByFile method in the BEASTLabs-add-on (see Manage packages on how to install). It runs a number of chains in parallel that communicate with each other through the file system on set intervals. When a chain gets too far behind, it samples a state from the other chains proportional to their posteriors (effectively taking the most likely most of the time).

To convert a BEAST XML file, replace the run entry

 

with

  
 
 #!/bin/sh
 cd $(dir)
 java -Dbeast.particle.dir=$(dir) -Dbeast.debug=false -Djava.library.path=$(java.library.path) -cp $(java.class.path) beast.app.BeastMCMC -resume -seed $(seed) $(dir)/beast.xml >> $(dir)/beast.log 2>&1 
 exit

and do not forget to close the mcmc element, just before the &lt/run> closing tag

 
 
   
    ...
  
 	 	
        
   

Inside rootdir, it creates subdirectories particleX where X is a number from 0 to nrofparticles.

The text content of the run element is interpreted as a script, and it is executed for every particle. There are a few variables that are replaced before launching the script

 $(dir) working directory for the particle, including rootdir specified in run (ParticleFilter) element
 $(java.class.path) class path used for launching BEAST
 $(java.library.path) path used for launching BEAST
 $(seed) random number seed for the particle, each particle gets a unique seed

You have to adapt the script for your own cluster. The example below is an example for the nesi-cluster using loadLeveler.

  
 
   echo "#@ shell = /bin/sh"> $(dir)/particle.job
   echo "#@ job_name = birds">> $(dir)/particle.job
   echo "#@ class = default">> $(dir)/particle.job
   echo "#class = gpu">> $(dir)/particle.job
   echo "#@ group = nesi">> $(dir)/particle.job
   echo "#@ account_no = /nz/nesi">> $(dir)/particle.job
   echo "#@ notify_user = remco@cs.auckland.ac.nz">> $(dir)/particle.job
   echo "#@ notification = complete">> $(dir)/particle.job
   echo "#@ wall_clock_limit = 700:00:00">> $(dir)/particle.job
   echo "#@ environment = COPY_ALL">> $(dir)/particle.job
   echo "#@ node_resources = ConsumableMemory(4096mb) ConsumableVirtualMemory(4096mb)">> $(dir)/particle.job
   echo "#@ job_type = parallel">> $(dir)/particle.job
   echo "#@ node = 1">> $(dir)/particle.job
   echo "#@ tasks_per_node = 5">> $(dir)/particle.job
   echo "#@ initialdir = $(home)/birds/$(dir)">> $(dir)/particle.job
   echo "#@ output = $(job_name).$(jobid).out">> $(dir)/particle.job
   echo "#@ error = $(job_name).$(jobid).err">> $(dir)/particle.job
   echo "#@ queue">> $(dir)/particle.job
   echo "">> $(dir)/particle.job
   echo "export LD_LIBRARAY_PATH=/usr/local/lib:/share/apps/cuda/lib64:/share/apps/cuda/lib:$LD_LIBRARAY_PATH">> $(dir)/particle.job
   echo "module load cuda">> $(dir)/particle.job
   echo "">> $(dir)/particle.job
   echo "java -Djava.library.path=/usr/local/lib -Dbeast.particle.dir=/home/rbou019/birds/$(dir) -cp /home/rbou019/birds/BEAST_CLASSIC.jar:/home/rbou019/birds/jam.jar:/home/rbou019/birds/colt.jar beast.app.beastapp.BeastMain -beagle -beagle_SSE -seed $(seed) -threads 5 -resume beast.xml ">> $(dir)/particle.job
  #llsubmit $(dir)/particle.job

Leave a Reply

Bayesian evolutionary analysis by sampling trees