Tag Archives: BEAST 2

Blog

What is new in BEAST v2.1.3 and its packages — this site is deprecated: goto https://www.beast2.org/ for up-to-date information

May 12, 2014 rbou019 1 Comment

12 May 2014 by Remco Bouckaert

The main reason for this release is to fix bugs in BEAUti for *BEAST — previously, the NodeReheight operator did not get assigned any gene trees. As a consequence, the topology of the species tree did not change (much) in an analysis produced by BEAUti 2.1.2. Further, when a calibration with timing information is placed on the species tree in BEAUti, all clock rates of gene trees in the analysis should be estimated. By default, one of the genes has its clock rate fixed to one, while the remaining genes have their clock rate estimated. BEAUti now ensures the clock rate of all genes are automatically set to estimated (if the mode/automatic set clock rates is checked, which it is by default). A regression test was added to the code ensuring that the *BEAST tutorial (as well as the other two tutorials) produces the numbers as expected.

An AppStore application is added that can be started by double clicking the AppStore icon, just like one would start BEAST or BEAUti. The AppStore makes it easy to start applications defined in packages, such as the TreeSetAnalyser in SNAPP for producing statistics from tree files produces in a SNAPP analysis. Starting package applications otherwise would involve cumbersome setting up of class paths from the command line.

Fasta file imported were mostly classified as datatype of amino acid when imported into BEAUti. Now, the file extension is used (.fna = nucleotide), and a smarter patter matching rule applied to detect whether the type is nucleotide instead of amino acid.

When using the log normal distribution in a prior in BEAUti it now displays the distribution properly in the chart.

When an MRCA prior is put on a clade, and it is selected as monophyletic the proportion of the time the clade is monophyletic used to be part of the log file and get stuck at 1. With many such priors, displaying these in Tracer caused clogging. Now, this statistic is not logger any more.

The upgrade button in the PackageManager now actually upgrades a package.

BEASTShell, a package for scripting BEAST, is under heavy development and an very early release is already available (see details).

Blog

Zipping up a package for BEAST — this site is deprecated: goto https://www.beast2.org/ for up-to-date information

May 5, 2014 rbou019 Leave a comment

5 May 2014 by Remco Bouckaert

There are two ways to put everything you need for a BEAST package into a zip file so that it can be used by the package manager:

If you like ant, the easieast way is re-using a build.xml file
Otherwise, you can zip up a package by hand

The first way has the disadvantage you might not quite understand everything that goes on. So, to my surprise, many people opt for the second way — hence the explanation below to tell everything you need to know to make it work

Using `ant`

Assumptions for this way to work:

You have you project organised such that the BEAST2 source code is in the beast2 directory and your package source code is in beast2/../mypackage, that is mypackage is a sibling directory of beast2.
The source code is inside the mypackage/src directory and you use the beast package for classes you want to distribute.

Copy a build.xml and version.xml file from an existing project into the directory of your package directory. I’ll assume these get copied from the build.xml file from the RBS package, but you can use XML files from other projects as well.
Search/replace every occurrance of RBS with the name of your package in build.xml
You may want to change files that are copied. Go to the “addon” target in the build.xml file (where it says <target name="addon") where you see a few &ltcopy> elements. You may need to delete some entries if you do not have any BEAUti templates, or rename them.
Do the same with version.xml
run ant addon from the command line inside the mypackage directory.

The screen output — a fragment shown below — will show where the zip file containing the package will be created, in this case /Users/remcobouckaert/workspace/mypackage/build/dist/MyPackage.addon.v1.0.0.zip:

...

addon:
   [delete] Deleting directory /Users/remcobouckaert/workspace/mypackage/release/add-on
    [mkdir] Created dir: /Users/remcobouckaert/workspace/mypackage/release/add-on
    [mkdir] Created dir: /Users/remcobouckaert/workspace/mypackage/release/add-on/lib
    [mkdir] Created dir: /Users/remcobouckaert/workspace/mypackage/release/add-on/examples
    [mkdir] Created dir: /Users/remcobouckaert/workspace/mypackage/release/add-on/templates
     [copy] Copying 1 file to /Users/remcobouckaert/workspace/mypackage/release/add-on
     [copy] Copying 47 files to /Users/remcobouckaert/workspace/mypackage/release/add-on/examples
     [copy] Copying 1 file to /Users/remcobouckaert/workspace/mypackage/release/add-on/lib
     [copy] Copying 1 file to /Users/remcobouckaert/workspace/mypackage/release/add-on
     [copy] Copying 2 files to /Users/remcobouckaert/workspace/mypackage/release/add-on/templates
      [jar] Building jar: /Users/remcobouckaert/workspace/mypackage/build/dist/MyPackage.addon.v1.0.0.zip
     [echo] Add-on version v1.0.0 release is finished.

BUILD SUCCESSFUL
Total time: 9 seconds

Roll your own

You might want to read the relevant chapter in the BEAST book to start with.

BEAST looks for jar files inside the package directory where each package has its own subdirectory. It only picks up jar files if they are in the lib directory. Likewise, BEAUti templates are only picked up from XML files in the templates directory.

By convention, the following directory structure with some of the files are expected:

   META-INF/
   META-INF/MANIFEST.MF
   examples/                 // example XML or JSON files
   examples/myExample.xml
   doc/                      // user documentation
   doc/mypackage.pdf
   lib/                      // jar files with executable code and dependent libraries
   lib/mypacakge.addon.jar
   templates/                // BEAUti template files
   templates/MyTemplate.xml
   MyPackage.src.jar         // zipped up source files
   version.xml               // version and dependency information

The version.xml file tells the version of the package and which other packages — and their versions — this package relies on.

So, you should create the above files, in a structure similar to the above and zip them up. Note that you want to compile the java classes to be 1.6 compatible if you only use 1.6 code. Also, make sure that the jar with binary classes is in the lib directory (not mypackage/lib or addon/lib). Finally, any BEAUti templates should go in the templates directory.

You might want to name your zip file something along the lines of mypackage.addon.v1.0.0.zip to keep track of version numbers and indicating the purpose of the file.

Trouble shooting

The ant file fails: typically, ant produces sensible error messages, so read these! Typos, missing files, misnamed directories are not uncommon.

BEAST does not load the jar file: perhaps it is in the wrong location. BEAST expects mypackage.jar to be in the location $BEAST/2.1/MyPackage/lib/mypacakage.jar where $BEAST is the package directory for your OS, 2.1 is the version number (major=2, minor=1, patchnr=any) of BEAST. If you happen to zip all files in the addon directory, your jar file may end p in $BEAST/2.1/MyPackage/addon/lib/mypacakage.jar where it is ignored.

BEAST says wrong version nr: the jar file is compiled against a java version newer than 1.6. Many users are still using Java 6, so we try to keep the code Java 1.6 compatible for now. To fix this, ensure the jar file is 1.6 compatible.

BEAUti template does not load: perhaps it is not in the correct location. BEAUti tries to load templates of packages by loading any XML file from the directory $BEAST/2.1/MyPackage/templates/. If your template is in another location, it will be ignored.

Installing the package

Some options:

let me know the url where the zip file is available on web and I wil add it to the list of packages that BEAUti uses to manage packages. This way, everyone that install BEAST/BEAUti has access to the package immediately.
ask users to unzip your zip file in the BEAST pacakge directory, which can be a bit awkward.
maintain your own package XML file (similar to the one here, and tell users the url of the package XML file. In the package manager in BEAUti, you can add this url using the Manage package locations button.

Blog

Simulation studies with BEAST 2 — this site is deprecated: goto https://www.beast2.org/ for up-to-date information

April 28, 2014 rbou019 Leave a comment

28 April 2014 by Remco Bouckaert

Ever wondered whether the gamma shape that is estimated has any relation to the gamma shape that generated the data? Probably not, but there are many cases where you want to find out whether a model that you use has any relation to the process that generated the data. If so, you probably want to do a simulation study and observe how parameters that generate sequence data have an impact on the parameters of the model that you use to analyse the data.

BEAST has a number of tools that you can use to do a simulation study. Here, we will check what will happen when you change the rate heterogeneity and how it impacts the gamma shape estimate when using the gamma rate heterogeneity model.

Generating an alignment

First of all, we will set up a simulator using the SequenceSimulator class in BEAST. At the time of writing, no GUI is available, so some XML hacking is required.

We will generate a sequence from a 3 taxa fixed tree using the HKY substitution model (kappa=5) and 4 rate categories with gamma shape ranging from 0.2 to 15. The following XML shows how to do this:

If you save the XML as simluator.xml you can run it using BEAST, and it will produce a file called gammaShapeSequence.xml containing a sequence alignment with 10000 characters.

Setting up an analysis

Next step is to create a BEAST MCMC analysis file. The easiest way to do this is using BEAUti. Open BEAUti using the (default) Standard template. Import the alignment from the generated file gammaShapeSequence.xml. Go to the Sitemodel panel, and change the substitution model to HKY. Then, set the number of categories to 4, and click the estimate checkbox next to the shape. The panel should look something like this:

Save the file as analysis.xml. After running the analysis with BEAST, and using tracer to inspect the log file SequenceSimulator.log, tracer could report something like this:

As you can see, the gamma shape is estimated as 1.287 with a 95% HPD Interval of [0.8596, 1.7296], so the original gamma shape value of 1.0 is in the 95% HPD Interval. So, it appears that the gamma shape parameter can be estimated. How about the bias? Perhaps we should run the simulation a few more time and see how it behaves over say 10 independently generated alignments.

Repeating this 10x

To repeat the above — simulate sequences, merge with an analysis, and run the analysis — say 10 times, we can adjust the XML script simply by adding the following:

Add an iterations attribute to the run element, setting iterations="10".

Add a merge element indicating that the generated alignment should be merged with the XML representing an analysis — it replaces the alignment in a BEAST XML file with the one generated by the SequenceSimulator. Since we just created the file analysis.xml with the kind of analysis we want to run, we can use that file.

Note that you can specify more than one merge file, and every template file has its alignment replaced with the same generated alignment. This is handy if you want to find out how model misspecification impacts your estimates.

The XML should look something like this:

Save as simulator10.xml in the directory containing analysis.xml and when you run the XML file with BEAST, there should be 10 files generated, named analysis-0.xml, analysis-1.xml, up to analysis-9.xml. You could run the ten XML files with BEAST, but you will quickly find out that every log file has the same name — running each file in its own directory, or renaming log files after running an analysis is quite labourious.

Fortunately The sequence simulator, before it replaces the alignment in the analysis file, replaces every instance of “$(n)” (dollar followed (n)) by the iteration number. So, if we change the file names of the loggers in the analysis.xml file like so:

From

And, from

It is probably also a good idea to set the chainLength="4000000" in analysis.xml. Rerun BEAST on simluator10.xml.

Now we can run the 10 simulator files and get log file SequenceSimulator-0.log, SequenceSimulator-1.log, … SequenceSimulator-9.log.

Analysing the log files can be done with tracer with the 10 log files. For the 10 log files, we can select the combined log, which looks something like this:

The mean estimate over 10 runs is 1.0752 with 95% HPD Interval of [0.7423, 1.503], so the gamma shape used for generating the data is close to the mean and it is firmly in the 95% HPD interval.

Repeating 10x for 21 different gamma shape values

So, it looks like the gamma shape value can be infered when the value is 1.0. How about other values? To test this, we can change the simulator.xml file for different values of the shape parameter, and run the analysis 10 times for each gamma shape.

Analysing with `LogAnalyser` and `R`

Doing this for 21 values of the shape parameter gives 210 log files, which become a bit cumbersome to analyse with tracer. If you would have run the same analysis with a different template (e.g. with a GTR model instead of HKY), you get another 210 log files.

BEAST has a LogAnalyser that produces similar output to tracer in table format. To run it, you can use

java -cp beast.jar  beast.util.LogAnalyser trace.log

Running it on one of the trace logs produces output like this:

~> java -cp beast.jar beast.util.LogAnalyser SequenceSimulator-9.log
Loading SequenceSimulator-9.log skipping 400 log lines

|---------|---------|---------|---------|---------|---------|---------|---------|
*********************************************************************************

Calculating statistics

|---------|---------|---------|---------|---------|---------|---------|---------|
********************************************************************************

item            mean     stderr   stddev   median   95%HPDlo 95%HPDup ACT      ESS      geometric-mean 
posterior       -28426.3 0.104762 1.962981 -28426.0 -28430.0 -28422.9 10256.42 351.0971 NaN     
likelihood      -28422.3 0.104126 1.814905 -28422.0 -28425.7 -28419.3 11853.12 303.8017 NaN     
prior           -4.05647 0.01505  0.768483 -3.79066 -5.60204 -3.26877 1381.170 2607.208 NaN     
treeLikelihood  -28422.3 0.104126 1.814905 -28422.0 -28425.7 -28419.3 11853.12 303.8017 NaN     
TreeHeight      0.200092 0.000399 0.006715 0.199813 0.187791 0.213805 12723.26 283.0248 0.19998 
YuleModel       0.009806 0.013149 0.757205 0.290776 -1.48896 0.721453 1085.933 3316.040 NaN     
birthRate       5.396959 0.054138 3.102894 4.803029 0.51353  11.32685 1096.214 3284.940 4.521489
gammaShape      1.198491 0.013759 0.195758 1.186229 0.853668 1.591099 17788.98 202.4286 1.183323
kappa           4.993622 0.017951 0.231987 4.993078 4.55073  5.396354 21561.68 167.0091 4.988256
freqParameter.1 0.25509  0.000216 0.003752 0.254981 0.248605 0.262504 11883.76 303.0184 0.255063
freqParameter.2 0.24809  0.000193 0.003549 0.24805  0.241616 0.255487 10634.58 338.6120 0.248064
freqParameter.3 0.246065 0.00022  0.003662 0.245816 0.239572 0.25322  13036.39 276.2267 0.246038
freqParameter.4 0.250755 0.000213 0.003567 0.250743 0.244575 0.258267 12875.19 279.6851 0.250729

For this run, the estimated gamma shape is 1.198491 with an ESS of 202.

By specifying the prefix directive, you can add a column to identify the run by a prefix value. For example, setting -Dprefix=1.0 like so:

~> java -Dprefix=1.0 -cp beast.jar beast.util.LogAnalyser SequenceSimulator-9.log

Loading SequenceSimulator-9.log skipping 400 log lines

|---------|---------|---------|---------|---------|---------|---------|---------|
*********************************************************************************

Calculating statistics

|---------|---------|---------|---------|---------|---------|---------|---------|
********************************************************************************

item            prefix mean     stderr   stddev   median   95%HPDlo 95%HPDup ACT      ESS      geometric-mean 
posterior       1.0 -28426.3 0.104762 1.962981 -28426.0 -28430.0 -28422.9 10256.42 351.0971 NaN     
likelihood      1.0 -28422.3 0.104126 1.814905 -28422.0 -28425.7 -28419.3 11853.12 303.8017 NaN     
prior           1.0 -4.05647 0.01505  0.768483 -3.79066 -5.60204 -3.26877 1381.170 2607.208 NaN     
treeLikelihood  1.0 -28422.3 0.104126 1.814905 -28422.0 -28425.7 -28419.3 11853.12 303.8017 NaN     
TreeHeight      1.0 0.200092 0.000399 0.006715 0.199813 0.187791 0.213805 12723.26 283.0248 0.19998 
YuleModel       1.0 0.009806 0.013149 0.757205 0.290776 -1.48896 0.721453 1085.933 3316.040 NaN     
birthRate       1.0 5.396959 0.054138 3.102894 4.803029 0.51353  11.32685 1096.214 3284.940 4.521489
gammaShape      1.0 1.198491 0.013759 0.195758 1.186229 0.853668 1.591099 17788.98 202.4286 1.183323
kappa           1.0 4.993622 0.017951 0.231987 4.993078 4.55073  5.396354 21561.68 167.0091 4.988256
freqParameter.1 1.0 0.25509  0.000216 0.003752 0.254981 0.248605 0.262504 11883.76 303.0184 0.255063
freqParameter.2 1.0 0.24809  0.000193 0.003549 0.24805  0.241616 0.255487 10634.58 338.6120 0.248064
freqParameter.3 1.0 0.246065 0.00022  0.003662 0.245816 0.239572 0.25322  13036.39 276.2267 0.246038
freqParameter.4 1.0 0.250755 0.000213 0.003567 0.250743 0.244575 0.258267 12875.19 279.6851 0.250729

This makes it easy to have for example an R script to find the estimates associated with a particular value used for generating the data.

If you specify more than one indicators, for example the iteration number and the shape parameter, just insert them in braces, like so

~> java -Dprefix="1.0 9" -cp beast.jar beast.util.LogAnalyser SequenceSimulator-9.log

The output now has two columns, with headers labeled prefix0 prefix1 and column values 9 1.0

item            prefix0 prefix1 mean     stderr   stddev   median   95%HPDlo 95%HPDup ACT      ESS      geometric-mean 
posterior       9 1.0 -28426.3 0.104762 1.962981 -28426.0 -28430.0 -28422.9 10256.42 351.0971 NaN     
likelihood      9 1.0 -28422.3 0.104126 1.814905 -28422.0 -28425.7 -28419.3 11853.12 303.8017 NaN     
prior           9 1.0 -4.05647 0.01505  0.768483 -3.79066 -5.60204 -3.26877 1381.170 2607.208 NaN     
treeLikelihood  9 1.0 -28422.3 0.104126 1.814905 -28422.0 -28425.7 -28419.3 11853.12 303.8017 NaN     
TreeHeight      9 1.0 0.200092 0.000399 0.006715 0.199813 0.187791 0.213805 12723.26 283.0248 0.19998 
YuleModel       9 1.0 0.009806 0.013149 0.757205 0.290776 -1.48896 0.721453 1085.933 3316.040 NaN     
birthRate       9 1.0 5.396959 0.054138 3.102894 4.803029 0.51353  11.32685 1096.214 3284.940 4.521489
gammaShape      9 1.0 1.198491 0.013759 0.195758 1.186229 0.853668 1.591099 17788.98 202.4286 1.183323
kappa           9 1.0 4.993622 0.017951 0.231987 4.993078 4.55073  5.396354 21561.68 167.0091 4.988256
freqParameter.1 9 1.0 0.25509  0.000216 0.003752 0.254981 0.248605 0.262504 11883.76 303.0184 0.255063
freqParameter.2 9 1.0 0.24809  0.000193 0.003549 0.24805  0.241616 0.255487 10634.58 338.6120 0.248064
freqParameter.3 9 1.0 0.246065 0.00022  0.003662 0.245816 0.239572 0.25322  13036.39 276.2267 0.246038
freqParameter.4 9 1.0 0.250755 0.000213 0.003567 0.250743 0.244575 0.258267 12875.19 279.6851 0.250729

Using the LogAnalyser, you can create a text file containing the estimates for all logs. Doing this for various values of gamma shape,

java -Dprefix=0.2 -cp beast.jar beast.util.LogAnalyser SequenceSimulator-0.2.log >> log.dat
java -Dprefix=0.3 -cp beast.jar beast.util.LogAnalyser SequenceSimulator-0.3.log >> log.dat
java -Dprefix=0.4 -cp beast.jar beast.util.LogAnalyser SequenceSimulator-0.4.log >> log.dat
...
java -Dprefix=15.0 -cp beast.jar beast.util.LogAnalyser SequenceSimulator-15.log >> log.dat

You can create a log.dat that you can use in R for a plot. In R, we can do the following — assuming we generated alignments with shape parameter values 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0, 2.5, 3.0, 4.0, 5.0, 7.5,10.0, 15.0:

> x <- read.table("log.dat",header=1)
> s <- c(0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0, 2.5, 3.0, 4.0, 5.0, 7.5,10.0, 15.0)
> plot(s,s,log="xy",ann=0);
> lines(s,as.numeric(as.character(x[which(x$item=="gammaShape"),"mean"])))
> title("3 taxa, gamma site model with nvarying shape parameter", xlab="shape used for generating data", ylab="estimated shape")

This gives the following graph showing the estimate of shape parameter against the value used for simulating the data. Over all, there is a close match, except for large values of the shape parameter.

Blog

BEAST 2 paper is out! — this site is deprecated: goto https://www.beast2.org/ for up-to-date information

April 21, 2014 rbou019 Leave a comment

21 April 2014 by Remco Bouckaert

The BEAST paper “BEAST 2: A software platform for Bayesian evolutionary analysis” by Remco Bouckaert, Joseph Heled, Denise Kuehnert, Tim Vaughan, Chieh-Hsi Wu, Dong Xie, Marc Suchard, Andrew Rambaut, and Alexei J Drummond is now available from PLOS Computational Biology 10(4), in case you want to cite BEAST 2 (which 3 people already did at the date of publication, according to Google scholar 🙂 ). PLOS Comp Bio is one journal in the series of open access PLOS publications with a knowledgeable editorial board and fast review time and publication process, so there is a lot to commend PLOS Comp Bio.

However, (disclaimer: the following is my personal opinion/experience and does not necessarily reflect any of those of my co-authors) the publication process left some things to be desired. Previously I published with the Journal of Machine Learning Research, which I would consider the ideal publisher: it is open access, has a high impact factor and there are no fees. Let’s look at some specific issues comparing PLOS and JMLR.

Cost

For JMLR, MIT provides the web server, saving few hundred dollars per year it would otherwise cost for commercial web hosting. According to an insider, editor in chief and editorial board are volunteers and webmasters is a student volunteer — note that I never noticed any chance in the lean submission and manuscript management system through the years, the system just works and does not require any development (even though it looks a bit unpolished). Like for most journals, reviewers are unpaid volunteers, so there is not costs involved there. The highest cost involved is a tax accountant, and average cost per paper is estimated at $6.50, covered by donations and MIT.

PLOS Comp Bio on the other hand charges $2300 per paper to be paid by the authors. Now, JMLR is an extreme example in that it relies on volunteers to a large extend and I appreciate not every journal could be run that way. Also, this is much lower than the more than $5000 ($8 billion revenue for 1.5 million papers) on average commercial publishers receive per paper, but still, it is a significant amount.

Paper submission

For JMLR, submitting a paper does not take a lot of effort — it is a one page form that only requires entering all authors on your paper, uploading the pdf paper and a cover letter (if you want to), and entering the title. Apparently, that is all that is required for running a high quality journal for a specialised field. Recommending reviewers is not required, since the editorial board is sufficiently knowledgeable to know the people in the field. If you need to declare conflict of interest, this can be done in the cover letter, and one does not have to go through some hoops like copying some standard phrase in a text field declaring no conflict of interest exists.

Compare that to submissions to PLOS; there are many HTML pages involved, and lots of information is required — such as conflicts of interests — before one is able to complete a submission. Just for going through the process of submitting a paper and filling in the forms one has to reserve a couple of hours.

Guidelines

For JMLR, a Latex style file (jmlr2e.sty) is provided that is easy to use. You prepare your file as any Latex file and have full control over how it looks like.

JMLR article format instructions are 2866 words when copying the page and counted in Word while PLOS’s is just over 7000 words, more than double that. But that is not all, there are separate guidelines for Latex document, figure and tables, and Supporting information none of which an author should be burdened with, since this is where publishers can add value, but apparently chooses not to.

Production process

JMLR author guidelines are short, to the point and contains a 8 point checklist that is helpful in submitting your paper. PLOS just insists that each author has to become an expert in publishing production processes — requiring skills such as converting between graphic file formats, embedding fonts in EPS files, scaling graphics to size, etc. None of these skills are required when using Latex with JMLR — you can just use your favourite file format and stick with that, and submit the Latex output.

I appreciate that PLOS publishes in different formats, and given the Latex source, it should be able to do so. Unfortunately, that process is broken — it does not convert characters with umlauts properly, and inserted random characters, in the case of the BEAST paper. One would think that these would be issues that should have been fixed by now. The amount of time me and my co-authors wasted on tedious typesetting issues like the above would be zero if we would have published with a system that JMRL provides.

Most authors that publish with PLOS do not use Latex, so this might be a relatively convenient process for the majority of papers submitted as Word document, but I think there is room for improvement.

Alternatives

I am picking out PLOS here because that is where the BEAST paper is published, but there are many other journals that come with a similarly uncomfortable production process.

Unfortunately, for most bioinformatics papers, JMLR is not an option and it would not have been appropriate for the BEAST paper. I am curious about experience with PeerJ, which appears to offer a smooth interface for submitting papers in Latex. I am not quite sure about the requirements for images though — the author guidelines (a pleasant 3000 word document) still mention the arcane EPS format.

Blog

Programming BEAST without Java — this site is deprecated: goto https://www.beast2.org/ for up-to-date information

April 14, 2014 rbou019 Leave a comment

14 April 2014 by Remco Bouckaert

If you want to log say a simple function of a parameter or set of parameters, programming a logger in Java is often overkill. A much simpler way is to use some of the scripting functionality that in BEAST. There are a number of options;

RPNCalculator which takes expressions in reverse polish notation

ExpParser in the MASTER? package, you can use which takes arithmetic expressions

Script in the BEASTLabs package, you can use complex arithmetic expressions as well as functions

RPNCalculator

RPNCalculator is the simplest and most primitive of the lot. It takes as input a set of Functions and an expression in reverse Polish notation (RPN). RPN is stack based, and takes arguments first, and whenever an operator is pushed on the stack, it uses the top most positions to execute the operator. So “2 3 +” returns 5, “2 2 2 / /” return 0.5 as does “2 2 2 * /”. Variable names are resolved by the IDs of the arguments. Below, a complete XML fragment

ExpCalculator

ExpCalculator can be found in the feast package and allows simple expressions to used as a Function. For example, to calculate the Eucledian distance to the point (20,50), you could use the following:

20
50

There is also a ExpCalculatorDistribution that you can use as a distribution where you can specify an expression to represent to log-density of the distribution. For example — ignoring constants — a normal distribtion could be specified like so:

which can be used in for example the prior. For more information, see the feast package.

Script

To use Script you need to install the BEASTLab package (see the BEAST website for how to install packages).

With beast.util.Script you can now run complex mathematical expressions like

3 * sin(a[0]) + log(a[1]) * b

where a is a Function with 2 dimensions and b a single values Function. Parameters and Distributions are Functions, so you can use these for your expressions. Since Script is also a Function you can use the result of a Script in another Script.

Script has an input named x, and every Function and the variable names in the expression must match the ID of the input-value. A complete XML fragment for logging the above expression with two parameters a and b could look something like this:

1.0 2.0
3.0

With Script, you can define complex and recursive functions, for example factorial, like so:

function fac(x) {
        if (x <= 1) {return 1;}
        return x * fac(x-1);
}

function f(a) {return fac(a);}

Note that if you specify a scripts instead of an expression, the engine always
calls function f with arguments of the inputs x in order of appearance. The function specification goes as text inside the element, unlike an expression, which goes inside the expression attribute. An XML fragment logging the factorial of parameter a could look something like this:

5.0


	

	function fac(x) {
		    if (x <= 1) {return 1;}
		    return x * fac(x-1);
	}

	function f(a) {return fac(a);}

Note that because the text is XML, any XML character (especially '<' and '&') need to be escaped. The <= in the above script must replaced by <=. To prevent this, you can wrap the text in a CDATA block, of which the content is not interpreted as XML but taken as is.

5.0

Script syntax

By default, the Script uses JavaScript syntax for coding expressions and functions. For expressions, the Math scope is used, so any Math-function is available, so the following functions are available: abs(), acos(), asin(), atan(), atan2(), ceil(), cos(), exp(), floor(), log(), max(), min(), pow(), random(), round(), sin(), sqrt(), tan(). A disadvantage is that debugging is awkward to put it mildly. If it does not compile, the ScriptEngine does not offer much clues to where things go wrong.

You can specify other script engines by setting the engine attribute to one of JavaScript, python, jruby, or groovy and provided the script engine is available in your class path, you can use other syntax for specifying a script (though not for expressions). For example, to use python, you need to include jython in the class path, and set engine='python' on the Script element. A factorial logger could be done like this in python:

So far, these scripts are rather simple, and can effectively only useable for logging of advanced information. In a later blog, we will look how to use scripts in other situations. But for now happy scripting!

Using ant

Roll your own

Trouble shooting

Installing the package

Generating an alignment

Setting up an analysis

Repeating this 10x

Repeating 10x for 21 different gamma shape values

Analysing with LogAnalyser and R

Cost

Paper submission

Guidelines

Production process

Alternatives

RPNCalculator

ExpCalculator

Script

Script syntax

Bayesian evolutionary analysis by sampling trees

Using `ant`

Analysing with `LogAnalyser` and `R`