[see also blog post onĀ Path sampling with a GUI.]
Contents
Set up
To set up a path sampling analysis, you need the model-selection package (pre BEAST version 2.1.2, it was in the beastii add-on) and edit the XML for the analysis.
Essentially, you need to wrap the run-element of the original analysis into a run element for the path sampler and rename the run element to mcmc like so
- rename the run element from the XML analysis into mcmc (do not forget to rename the closing element </run> into </mcmc>)
- insert before the mcmc element the following fragment:
cd $(dir) java -cp $(java.class.path) beast.app.beastapp.BeastMain $(resume/overwrite) -java -seed $(seed) beast.xml
- insert </run> after the closing </mcmc> element.
The new run element has the following parameters:
nrOfSteps the number of steps to use, default 8 alpha alpha parameter of Beta(alpha,1) distribution used to space out steps, default 0.3If alpha <= 0, uniform intervals are used. rootdir root directory for storing particle states and log files (default /tmp) mcmc MCMC analysis used to specify model and operations in each of the particles chainLength number of sample to run a chain for a single step Default: 100000 burnInPercentage burn-In Percentage used for analysing log files Default: 50 preBurnin number of samples that are discarded for the first step, but not the others Default: 100000 value script for launching a job. $(dir) is replaced by the directory associated with the particle $(java.class.path) is replaced by a java class path used to launch this application $(java.library.path) is replaced by a java library path used to launch this application $(seed) is replaced by a random number seed that differs with every launch $(host) is replaced by a host from the list of hosts hosts comma separated list of hosts. If there are k hosts in the list, for particle i the term $(host) in the script will be replaced by the (i modulo k) host in the list. Note that whitespace is removed deleteOldLogs delete existing log files from root dir doNotRun Set up all files but do not run analysis if true. This can be useful for setting up an analysis on a cluster
You can BEAST with multiple threads to make it run several chains in parallel (e.g. using the -threads option from the command line).
At the end of the run, the marginal likelihood estimate is printed. Also, in the directory specified by ‘rootdir’ for each step a subdirectory is created with the steps in it.
To recalculate the marginal likelihood estimates from the logs, run
appstore PathSampleAnalyser <nrofsteps> <alpha> <rootdir> <burninpercentage>
where nrofsteps, alpha, rootdir, burninpercentage should match the ones specified in the XML file.
You can use Tracer to inspect the log files for the various steps to make sure that th e burn-in used is sufficiently large.
Example file
In the $(BEAST-add-on)/model-selection/examples directory, there is a file testPathSampler.xml that should run, where $(BEAST-add-on) is the add-on directory on your operating system.
See Manage_Add-ons#Installation_directories for finding out what $(BEAST-add-on) is on your machine.
Setting up an analysis for a cluster
The path sampler will write shell scripts (for Mac, Linux) and batch files (for Windows) in the rootdir, one for every thread. These shell scripts can be executed separately. Once all scripts have been executed you need to run the PathSampleAnalyser to get the marginal likelihood estimate.
The number of batch files is determined by the number of threads used for running the XML. These batch files can be submitted to a cluster.
Calculating Bayes factors
At the end of a pathsampling run, the marginal likelihood estimate is reported, e.g.
Step theta likelihood contribution ESS 0 0 -19678.4774 -27.5334 15.1987 1 0.0015 -9950.7265 -113.8373 16.9272 2 0.0154 -6848.3262 -285.8846 39.1143 3 0.0593 -6153.3842 -580.8173 41.1282 4 0.1548 -6039.2476 -1028.7865 58.0664 5 0.3258 -6005.1028 -1633.722 24.3111 6 0.5982 -5990.5743 -2405.6227 14.0319 7 1 -5985.3503 0 8.9284 marginal L estimate =-6076.203746592951
You can use the value after “marginal L estimate” to compare models.
Say, the marginal likelihood estimate value for model 1 is X and the value for model 2 is Y, then the Bayes factor comparing model s1 and 2 is is X-Y. If this difference is positive, then the Bayes factor is in favour of model 1, if it is negative, it is in favour of model 2.
Resuming steps
If you suspect one or more of the steps did not run for long enough judging from a low ESS, you can resume the chain for that path for a little longer by resuming the chain.
The path sampler will write shell scripts (for Mac, Linux) and batch files (for Windows) in each of the step dirs called resume.sh or resume.bat. Execute them for the appropriate step and then run the PathSampleAnalyser.
SNAPP
Older versions of SNAPP (<1.1.10) use a non-standard MCMC loop. To use path sampling with a SNAPP analysis, you have to pre-process your XML file as follows:
1. replace snap.MCMC with beast.core.MCMC
2. copy the stateDistribution outside the run element, for example copy it to just before the run element.
3. See if the XML file runs. If not, and if there are any attributes that are not recognised on the run element, remove these.
Now you can follow the instructions above to do a path-sampling analysis.
Running as mulitple jobs
- In the XML, add doNotRun=’true’ to the run element. This ensures that you can start the analysis by hand, giving you control over which computer runs the jobs.
- Run BEAST on the XML file with -threads X where X is the number of separate jobs you want to run (e.g. the number of cores on the computer).
- Check that in the root directory specified in the XML for creating step directories, there are now X files called run0.sh run1.sh …. runX.sh on Mac or Linux and run0.bat, run1.bat,…, runX.bat on Windows.
- Start each of these scripts.
- Run the PathSampleAnalyser to get an estimate of the marginal likelihood.
Trouble shooting
Make sure no newline missing in the XML. The line that says
cd $(dir) java -cp $(java.class.path) beast.app.beastapp.BeastMain $(resume/overwrite) -java -seed $(seed) beast.xml
should be two lines
cd $(dir) java -cp $(java.class.path) beast.app.beastapp.BeastMain $(resume/overwrite) -java -seed $(seed) beast.xml
If you still have problems running, check that the script file (assuming your rootdir is /tmp/step/
/tmp/step/step0/run.bat
contains two lines, one starting with “cd” and one starting with “java”. If not, the newlines inserted in the XML did not get through to the batch file.
On windows note is the slashes in the rootdir. Not
rootdir="/temp/step/"
but
rootdir="tempsteps"