Tag Archives: BEASTShell

BEASTShell String Tricks

30 June 2014 by Remco Bouckaert

Multi-line strings

Multi-line strings have always been a bit cumbersome in Java. Either you have to add strings in quotes separated by a + sign, like so:

	String s = "This is a multi-linen" +
		"string that takes a lot of quotesn"+
		"and escape characters";
	String [] parts = s.split("n");

or use a StringBuilder and append every string like so:

	String s = new StringBuilder().append("This is a multi-linen").
		.append("string that takes a lot of quotesn").
		.append("and escape characters").toString();
	String [] parts = s.split("n");

which is even more verbose. Perl supports multi-line strings and the above simplifies to

perl % $s = 'This is a multi-line
string without very much need
for extra escape characters';
perl % @parts = split('n', $s);

There is a proposal by Stephen Colebourne to incorporate multi-line strings into Java, but ala, so far that never made it (we are at Java 8 now). Fortunately, hidden in the parser, and absent from the documentation, there is this little gem which tells we can have multi-line strings in BEASTShell. The escape sequence is “””, so the Perl fragment in BEASTShell would be

bsh %  s = """This is a multi-line
string without very much need
for extra escape characters""";
bsh %  parts = s.split('n');

which is just one characters more than the Perl script! If you want quotes inside the multi-line string, no escape sequence is necessary, just insert them in the string:

bsh %  s = """Darwin's Finches
Homo "Sapiens"
Double quote=""""";
bsh %  parts = s.split('n');
// parts is now [Darwin's Finches, Homo "Sapiens", Double quote=""]

bsh %  s = """Darwin's Finches
Homo "Sapiens"
Single quote="""";
bsh %  parts = s.split('n');
// parts is now [Darwin's Finches, Homo "Sapiens", Single quote="]

(which surely upsets the code colouring algorithm:-)) For triple quotes you still need to add two strings

bsh % s = """Darwin's Finches
Triple quote=""""" + """;
bsh %  parts = s.split('n');
// parts is now [Darwin's Finches, Triple quote="""]

but, he, would needs that?

The ugliness of regular expressions

Another eye-sore is Java syntax is regular expression handling. Look at this fragment, almost completely from the Java documentation:

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class MatcherDemo {

    private static final String REGEX = "\bdog\b";
    private static final String INPUT = "dog dog dog doggie dogg";

    public static void main(String[] args) {
       Pattern p = Pattern.compile(REGEX);
       //  get a matcher object
       Matcher m = p.matcher(INPUT);
       int count = 0;
       while(m.find()) {
           count++;
           System.out.println("match " + count + " = " + INPUT.substring(m.start(), m.end()));
      }
   }
}

(The original code in the documentation prints the m.start() and m.end(), but why would one be interested in the location instead of the matching string?) The equivalent in BEASTShell would be a bit shorter:

import java.util.regex.Pattern;
import java.util.regex.Matcher;

REGEX = "\bdog\b";
INPUT = "dog dog dog doggie dogg";

p = Pattern.compile(REGEX);
//  get a matcher object
m = p.matcher(INPUT);
count = 0;
while(m.find()) {
   count++;
   print("match " + count + " = " + INPUT.substring(m.start(), m.end()));
}

But in Perl this can be done even shorter:

$REGEX =  "\b(dog)\b";
$INPUT =  "dog dog dog doggie dogg";

$count = 0;
while ($INPUT =~ /$REGEX/g) {
	$count++;
    print("match $count = $1n");
}

Gedit counts these fragments as follows

#words #characters #characters
without whitspace
Java 68 400 548
BeanShell 46 262 313
Perl 22 120 151

In summary, regular expressions in Java and — to a lesser extent — BEASTShell are very verbose and ugly.

Improved regular expressions

BEASTShell has a regexp command that returns a list of matches, which can be used like so:

$REGEX =  "\b(dog)\b";
$INPUT =  "dog dog dog doggie dogg";

for(s : regexp(INPUT,REGEX)) {   
	print("match " + (++count) + " = " + s);
}

And now we can extend the table of Gedit counts:

#words #characters #characters
without whitspace
Java 68 400 548
BeanShell 46 262 313
Perl 22 120 151
BEASTShell 19 109 141

This is of course a bit cheating, because we can hide any complex function inside a command. But I think this is justified here since regular expression matching is common enough and verbose enough in Java/BeanShell that a few extra commands are very helpful. Also, it shows how an effectively defined command can help streamline your scripts.

Filter *BEAST analysis

Let’s put this to work to select a set of species taxa in an existing *BEAST analysis. First, define the original species and their sequences.

// the original species and their sequences
s="""Cyanocorax_mystacalis = {GU144828 GU144829 GU144831 GU144830}
C.validus = {JQ023974} 
Urocissa_erythrorhyncha = {JQ864482}
C.capensis = {JQ023977}
Cyanocorax_yucatanicus = {DQ912613 GU144848}
Perisoreus_canadensis = {JQ655939 JQ656012 JQ656006 JQ655975} """;

// one species per entry, so we can process s line by line
data = s.split("n");

// the new set of species 
filter ="""Urocissa_erythrorhyncha
Perisoreus_canadensis
Cyanocorax_yucatanicus""";

Then, we define a command to print out a new taxonset, using regexp to get info out of the string:

newtaxon(s, filter) {
	x = regexp(s, "(.*)=.*{(.*)}.*");
	//if (!filter.matches("(?s).*\b"+Matcher.quoteReplacement(x[1].trim())+"\b.*")) {
	if (!filter.matches("(?s).*\b"+x[1].trim()+"\b.*")) {
		return;
	}
	print("");
	for (id : x[2].trim().split("\s+")) {
		print("    ");
	}
	print("");
}

for (s : data) {newtaxon(s, filter);}

Assuming the original sequences are in a file called source.xml, we can grab the sequences from the file using:

newsequence(s, filter) {
	x = regexp(s, "(.*)=.*{(.*)}.*");
	if (!filter.matches("(?s).*\b"+x[1].trim()+"\b.*")) {
		return;
	}
	for (id : x[2].trim().split("\s+")) {
		exec("grep "+ id + " source.xml | grep sequence");
	}
}

for (s : data) {newsequence(s, filter);}

Roll your own models with BEASTShell

9 June 2014 by Remco Bouckaert

BEASTShell contains a number of classes that can be customised using script fragments, similar to the beast.util.Script class in BEASTLabs that we talked about before. This means that you can create customised loggers, substitution models, clock models, and a few other classes just by changing the XML that specifies a BEAST analysis.

The following table shows the classes, all in the beast.shell package, and the functions that the script is required to specify.

Class Base class/interface Required functions
BSHClockModel Base getRateForBranch(node, meanrate)
BSHDistribution Distribution calculateLogP()
BSHLogger Loggable init(out), log(sample, out), close(out)
BSHOperator Operator double proposal()
BSHParametricDistribution ParametricDistribution density(x), cumulativeProbability(x), inverseCumulativeProbability(p)
BSHSiteModel SiteModelInterface.Base getCategoryCount(), getCategoryOfSite(site, node), getRateForCategory(category, node), getCategoryRates(node), getProportionForCategory(category, node), getCategoryProportions(node)
BSHSubstitutionModel SubstitutionModel.Base getTransitionProbabilities(node, fStartTime, fEndTime, fRate, matrix)
BSHTreeDistribution TreeDistribution calculateLogP(tree, intervals)

Each of these classes have an input called x, which is a list of Functions. By default, the ID of a function is used as global variable in the script, but you can assign a name to a Function using the beast.shell.NamedFunction class. The following

	

assigns the name “param” to the value of birthRate.t:alignment, and when using the variable “param” in the script it will take on the value of birthRate.t:alignment.

Each of the script classes also have a value input, which can be used to specify a script. All inline text in an element will be passed to the value-input, which means you can put your script inside a CDATA block in the XML inside the XML element representing the script-object. For example, a BSHDistribution that is proportional to 1/X can be defined as this:


    
    calculateLogP() {return Math.log(1.0/param.getValue());}

Note that to access “param”, which implements Function, we need to call Function methods, like getValue.

For BSHDistributions, we only need to specify calculateLogP, but for BSHLoggers, we need to specify three functions: init (typically used for printing the header in the trace file), log (to print a log value) and close (which for trace files means doing nothing, but for tree files an “End;” is expected). When any of these functions is not specified, an exception is thrown at the time the corresponding method is called. Likewise, an exception is thrown if there is an error in the program or any other unexpected condition is encountered.

To the following specifies a BSHLogger that reports the square of a value if its value is less than 1, otherwise it reports the value.


    

Here, we use a CDATA block so that we do not have to worry about escaping special characters like the < in if (param.getValue() < 1) .... Otherwise, you would have to escape those characters, giving if (param.getValue() &lt; 1) ... making things quite hard to read. All text within the CDATA block is taken as literal text and passed to the value-input.

BEASTShell recognises some XML docment friendly tokens that are mapped on operators, so you can write if (param.getValue() @lt 1) ... instead. The full set of tokens is listed below.

token interpreted as
@gt >
@lt <
@lteq <=
@gteq >=
@or ||
@and &&
@bitwise_and &
@bitwise_or |
@left_shift <<
@right_shift >>
@right_unsigned_shift >>>
@and_assign &=
@or_assign |=
@left_shift_assign <<=
@right_shift_assign >>=
@right_unsigned_shift_assign >>>=

I prefer the CDATA block, but you know about some alternatives now. You do not need to limit yourself to the functions required by the BSH-classes, but can define any other function to be used by the script as well.

The following demonstrates usage of the BSHOperator as a scale-operator on a RealParameter:


    
 param.getUpper()) {
                return Double.NEGATIVE_INFINITY;
        }
        param.setValue(new Double(newvalue));
        return 0.0;
}
]]>

Note how it sets the value of the parameter, which it can do only because the input is a RealParameter but may fail when other objects are used as “param”.

Simulation Studies with BEASTShell

2 June 2014 by Remco Bouckaert

A few weeks ago, we had a look at a simultion study with BEAST, using the SequenceSimulator, LogAnalyser and R. Let’s do that study again, but now with BEASTShell, a package for scripting BEAST-objects. It is based on BeanShell, but has special extentions to conveniently deal with BEASTObject classes.

In the previous blog, I sampled sequences for a range of shape parameter values that seemed reasonable in my experience. However, the model assumes an exponential prior distribution over the shape parameters with mean 1, so arguably, that information is sufficient to draw shape values from. In this study, we will

  1. sample 100 values of the shape parameter from this prior distribution,
  2. sample an alignment with 10.000 characters using 4 gamma categories and one of the sampled shape paramter values
  3. run the analysis on the sampled data
  4. record the estimate of the shape parameter
  5. create a plot of sampled against estimated shape values

Let’s get started with BEASTShell. The easiest way to create scripts is using BEASTStudio, the BEASTShell GUI. First, we import a bunch of packages we are going to need. BEASTShell already loads a nuber of packages (javax.swing.event, javax.swing, java.awt.event, java.awt,java.net, ava.util,java.io, and java.lang) which makse it easy to create standard class instances.

import beast.evolution.substitutionmodel.*;
import beast.evolution.sitemodel.*;
import beast.evolution.alignment.*;
import beast.util.*;
import beast.core.*;
import com.xeiam.xchart.*;
import beast.app.shell.*;

First, we draw gamma shape from the prior (exp(1.0))

shapes = rexp(100, 1.0);
estimate = new ArrayList();

Newx, we set up a model to draw samples from

data = new Alignment(
		sequence=new Sequence(taxon="human",value="?"),
		sequence=new Sequence(taxon="bonobo",value="?"),
		sequence=new Sequence(taxon="chimp",value="?")
	);
tree = new beast.util.TreeParser(newick="(human:0.2,(chimp:0.15,bonobo:0.15):0.05)", taxa=data, IsLabelledNewick=true);
hky = new HKY(kappa="5.0", frequencies=new Frequencies(data=data));
clockmodel = new beast.evolution.branchratemodel.StrictClockModel("1.0");

For each of the shape values (shape0 in the following), we set up a site model with gamma shape parameter

	sitemodel = new SiteModel(gammaCategoryCount=4, substModel=hky, shape=""+shape0);

Then, we specify which XML analysis, (we prepared in the previous blog) we want to use to analyse the generated alignment, using

	mergewith = new beast.app.seqgen.MergeDataWith(template="analysis.xml", output="analysis-out.xml");

Now, we have all the components ready to set up the sequence simulator:

	sim = new beast.app.seqgen.SequenceSimulator(data=data, tree=tree, sequencelength=10000, outputFileName="gammaShapeSequence.xml", siteModel=sitemodel, branchRateModel=clockmodel, merge=mergewith);
	sim.run();

This produces gammaShapeSequence.xml and merge it with analysis.xml to get analysis-out.xml. Now we can run the analysis using:

	mcmc = (new XMLParser()).parseFile(new File("analysis-out.xml"));
	mcmc.run();

After this finished in a minute or so, we grab estimates from the analysis using the LogAnalyser, and store it in the estimate list.

	log = new LogAnalyser(new String[]{"SequenceSimulator.log"}, 2000, 10);
	s = log.getMean("gammaShape");
	estimate.add(s);

After we have done this for all generated shape parameter values, we can draw a scatter plot and save to “Sample_Chart.png”, using

p = new Plot(x=shapes, y=estimate, title="Gamma shapensimulation", xAxisTitle="original", yAxisTitle="estimate", chartType="Scatter", isYAxisLogarithmic=true, isXAxisLogarithmic=true, output="png");

This analysis just overwrites log files for each generated alignment. To tell BEAST to overwrite, we need

Logger.FILE_MODE = beast.core.Logger.LogFileMode.overwrite;

Furthermore, we want to use BEAGLE, but dependeing on your hardware, this may not speed things up. Setting BEAGLE flags can be done like so:

beagleFlags = beagle.BeagleFlag.VECTOR_SSE.getMask() | beagle.BeagleFlag.PROCESSOR_CPU.getMask();
System.setProperty("beagle.preferred.flags", Long.toString(beagleFlags));

Running the analysis

So, that gives us the following complete script.

// This script demonstrates a simulation study for the
// shape parameter in the gamma rate heterogeneity model.
// Requirements: the file analysis.xml must be in the directory
// from where this script is run.

import beast.evolution.substitutionmodel.*;
import beast.evolution.sitemodel.*;
import beast.evolution.alignment.*;
import beast.util.*;
import beast.core.*;
import com.xeiam.xchart.*;
import beast.app.shell.*;

Logger.FILE_MODE = beast.core.Logger.LogFileMode.overwrite;

// set up flags for BEAGLE -- YMMV
beagleFlags = beagle.BeagleFlag.VECTOR_SSE.getMask() | beagle.BeagleFlag.PROCESSOR_CPU.getMask();
System.setProperty("beagle.preferred.flags", Long.toString(beagleFlags));

// draw gamma shape from the prior (exp(1.0))
shapes = rexp(100, 1.0);
estimate = new ArrayList();

// set up model to draw samples from
data = new Alignment(
		sequence=new Sequence(taxon="human",value="?"),
		sequence=new Sequence(taxon="bonobo",value="?"),
		sequence=new Sequence(taxon="chimp",value="?")
	);
tree = new beast.util.TreeParser(newick="(human:0.2,(chimp:0.15,bonobo:0.15):0.05)", taxa=data, IsLabelledNewick=true);
hky = new HKY(kappa="5.0", frequencies=new Frequencies(data=data));
clockmodel = new beast.evolution.branchratemodel.StrictClockModel("1.0");

k = 0;
for (shape0 : shapes) {
	sitemodel = new SiteModel(gammaCategoryCount=4, substModel=hky, shape=""+shape0);
	//sim = new beast.app.seqgen.SequenceSimulator(data=data, tree=tree, sequencelength=10000, outputFileName="gammaShapeSequence.xml", siteModel=sitemodel, branchRateModel=clockmodel);
	// produce gammaShapeSequence.xml
	//sim.run();
	
	mergewith = new beast.app.seqgen.MergeDataWith(template="analysis.xml", output="analysis-out.xml");
	sim = new beast.app.seqgen.SequenceSimulator(data=data, tree=tree, sequencelength=10000, outputFileName="gammaShapeSequence.xml", siteModel=sitemodel, branchRateModel=clockmodel, merge=mergewith);
	// produce gammaShapeSequence.xml and merge with analysis.xml to get analysis-out.xml
	sim.run();

	// run the analysis
	mcmc = (new XMLParser()).parseFile(new File("analysis-out.xml"));
	mcmc.run();

	// grab estimate from the analysis
	log = new LogAnalyser(new String[]{"SequenceSimulator.log"}, 2000, 10);
	s = log.getMean("gammaShape");
	print("iteration: " + (k++) + " shape = " + shape0 + " estimate = " + s);
	estimate.add(s);
}

// create scatter plot and save to /tmp/Sample_Chart.png
p = new Plot(x=shapes, y=estimate, title="Gamma shapensimulation", xAxisTitle="original", yAxisTitle="estimate", chartType="Scatter", isYAxisLogarithmic=true, isXAxisLogarithmic=true, output="png");

You can run the analysis in BEASTStudio, but you can also run it from the command line, which is probably a bit faster. BEASTShell has a command line interpreter that can be started with

java -cp beast.jar:bsh.jar bsh.Interpreter examples/simstudy.bsh

Once you ran the script, something similar to the following image should appear in the file Sample_Chart.png.

Looks like the shape parameter can be fairly reliably recovered from a large range of shape values. Only at the extremes — below 0.05 and over 15 — the prior is drawing the estimate to the mean.

Introducing BEASTShell

26 May 2014 by Remco Bouckaert

BEASTShell is a scripting language based on BeanShell 2.2 which itself is based on BeanShell. The goal of BeanShell is to provide a scripting environment supporting Java, and consequently its syntax is very much like Java. The main difference is that it is a scripting language, which means we get dynamic typing. Furthermore, there is an interactive interpreter, so you can type things like

bsh % 3+4;

bsh % print("Hello world!"); // print() is a BEASTShell command
Hello world!
bsh % "Hello world!";

bsh % x = 1 + 2 + 3 + 4;

bsh % for (i = 0; i < x; i++) 
		print(i);
0
1
2
3
4
5
6
7
8
9
bsh % 

Here bsh % is the prompt (which you can customise), and the remainder is output to the console. Simple expressions produce the result on the console if show() is on — which it is by default. The second line is the canonical hello-world program, which is quite a bit shorter than what you would need in Java! It can even be shortened to
"Hello world"; since the console print the value of expressions on screen, and the value of "Hello world"; is a String with its familiar content as value.

Note how we use print as a command. Commands are pre-defined functions that come from script files in the class-path. BEASTShell has many such commands, but before we dive in, let’s first install BEASTShell.

Installing BEASTShell

BEASTShell is a BEAST package, so to run BEASTShell you need to install BEAST first (available here). Then, you need to install the BEASTShell package. Start BEAUti and select the menu File/Manage Packages and a window pops up where you can select the BEASTShell package and click the “Install” to install the package. It is about 4MB at the moment, which may need a bit to get downloaded, so it may take a little time for the package to become available.

BEASTStudio

BEASTStudio is a basic IDE for BEASTShell and is probably a good starting point to get familiar with BEASTShell. There are many other ways to run BEASTShell scripts, including in a text-only console, in programs, as servlet, etc., but BEASTStudio is the easiest to get started. There is lots of help, demos and documentation, as well as editors with syntax colouring, a history of everything in the console and a variable inspector showing all available variables and their values. So, there is lots of interactive feedback when exploring BEASTShell programs.


BEASTStudio requires JavaFX

BEASTStudio relies on JavaFX, which means you need to run Java 7 or better, (or install JavaFX separately for Java 6, which only appears
to work on windows
). If you run Java 8, JavaFX is loaded automatically, and you do not need to worry about it. If you run Java 7, it depends on the update whether it loads JavaFX or not.

Regardless, BEASTStudio attempts to load JavaFX if it is not already loaded as follows:

  • If the JAVA_HOME environment variable is defined, it attempts to load $JAVA_HOME/jre/lib/jfxrt.jar
  • If that fails, if looks for the JAVAFX_JAR environment variable
    and attempts to load JAVAFX_JAR. So, if you specify JAVAFX_JAR=/usr/java/lib/jfxrt.jar then BEASTStudio attempts to load /usr/java/lib/jfxrt.jar.
  • If both of the above fail, it tries to load from /opt/java/jre/lib/jfxrt.jar.

If none of the above succeed, BEASTStudio will fail with an error message.

Running BEASTStudio

To start BEASTStudio, open the AppStore in BEAST and when BEASTShell is properly installed there should be an entry for BEASTStudio. Double click the icon, or select and click “launch” and BEASTStudio should display itself after a few seconds. If it does not, but the splash screen does, some libraries (e.g. JavaFX) may be missing.

Creating BEASTObjects

One of the main difference of BEASTShell is in how it treats classes that are derived from the BEASTObject class. To create a BEASTObject in Java requires calling the constructor of the object, initialising each of its inputs then
calling initAndValidate(). BEASTShell does these three steps in one single statement, for example,

import beast.evolution.alignment.*;
human = new Sequence(taxon="human", value="?"),

is the BEASTShell equivalent of

import beast.evolution.alignment.*;

... {
Sequence human = new Sequence();
human.taxonInput.setValue("human", human);
human.dataInput.setValue("?", human);
human.initAndValidate();
}

Note that the name-value pairs in creating the BEASTObject — these are independent of the order. If you leave them out, the order of inputs is used (you can check by showing the BEAST documentation with ?Sequence), which is more error prone since it can change in the future. Value do not necessarily need to be primitive values, you can use more complex objects as well. For example, a two sequence alignment can be created using

human = new Sequence(taxon="human", value="?"),
chimp = new Sequence(taxon="chimp", value="?"),
data = new Alignment(sequence=human, sequence=chimp);

Calculating tree statistics

One of the things we hoped to make easier (see previous post) is to calculate statistics on trees, such as the length of a tree. We can define the length of a tree for example like so

len(node) {
  if (node.isLeaf()) {
    return node.getLength();
  else 
    return len(node.getLeft()) + len(node.getRight()) + node.getLength();
}

// to test the len method, create a tree
tree = new beast.util.TreeParser(newick="(1:0.2,(2:0.15,3:0.15):0.05)");
len(tree.getRoot());
// returns 0.55

This implementation recurses through the (presumed binary) tree with just a few lines of code. It also demonstrates how to create a tree using the TreeParser BEASTObject. If import beast.util.*; was inserted at the start, the even been shorter tree = new TreeParser(newick="(1:0.2,(2:0.15,3:0.15):0.05)"); would be sufficient.

If the tree is not binary, we can use the following for non-binary trees:

// implementation that recurses through the (potentially non-binary) tree
len2(node) {
  if (node.isLeaf()) {
    return node.getLength();
  } else {
    len = node.getLength();
    for (child : node.getChildren()) {
      len += len2(child);
    }
    return len;
  }
}

len2(tree.getRoot());
// still returns 0.55
// now define a non-binary tree
tree2 = new beast.util.TreeParser(newick="(1:0.2,((2:0.1):0.1))", singlechild=true);
len2(tree2.getRoot());
// returns 0.40

An alternative implementation that loops over the nodes in just 4 lines of code:

len3(tree) {
  len = 0;
  for (node : tree.getNodesAsArray())
      len += node.getLength();
   return len;
}

len3(tree);
// returns 0.55
len3(tree2);
// returns 0.40

Getting Help

The parser recognises lines starting with question marks as cries for help. A single question mark followed by an expression brings up help on the expression (in BEASTStudio this is shown in the help panel). A double question mark followed by an object or class shows public fields and methods for that object or class (again, in the help panel when running BEASTStudio).

To get help on a command, use help(“cmd”); of alternatively ?cmd e.g.

bsh % help("print");
bsh % ?print
bsh % help("dnorm");
bsh % ?dnorm

To get help on a BEASTObject, use help(BEASTObject); e.g.

bsh %  ?beast.core.MCMC
or, if you already created an MCMC object, asking help for the object will have the same results:
bsh % tree = new beast.evolution.tree.Tree()
bsh % ?tree
bsh % javap(tree)
bsh % ??tree

To get help on any other class, use help(Class); e.g.

bsh %  help(java.util.ArrayList);
bsh %  ??java.util.ArrayList;
bsh %  data = new ArrayList();
bsh %  ?data

Running demos

There are a few demos that show off capabilities of BEASTShell. To list the set of demos, use demo();. There are four demos at the moment of writing:

  • beanshell: a quick primer on BeanShell syntax
  • chart: demonstrates various 2D charts x
  • trace: demonstrates how to interactively inspect traces
  • trees: demonstrates various trees and how to display them

To run a demo, use demo(demoname); e.g. demo("chart"); to start the chart-demo.

BEASTShell commands

There is already a large number of commands that come with BEASTShell, divided into three categories:

  • BEAST commands which are mostly specific to BEAST.
  • BeanShell commands which are inherited from BeanShell. Some commands are deleted or adjusted for BEASTShell.
  • Math, statistical and utility functions for general purpose usage.

These are the BEAST commands (at present):

demo demonstrate some BEASTShell capabilities in BEASTStudio
help provide online help
c construct a list of objecst, e.g. x=c(1,1.5,2,2.4)
assertEquals handy for testing, e.g. assertEquals(dbeta(0.4, 5, 1, false), 0.128, 0.0001);
beast start a BEAST run
beauti start a BEAUti session
appstore start the BEAST AppStore
requires indicate some BEAST package is required, and install if not present.
trace plot trace and show statistics of a BEAST trace file.
edit edit BEASTObject through a dialog.
plot plot component to plot panel or to file.
regression calculate regression line through a series of points.

The BeanShell commands are mostly commands for starting scripts and performing shell-like functions. From the BeanShell documentation, these are a few useful commands:

  • source(), run() – Read a bsh script into this interpreter, or run it in a new interpreter
  • frame() – Display a GUI component in a Frame or JFrame.
  • load(), save() – Load or save serializable objects to a file.
  • cd(), cat(), dir(), pwd(), etc. – Unix-like shell commands
  • exec() – Run a native application
  • javap() – Print the methods and fields of an object, similar to the output of the Java javap command.
  • setAccessibility() – Turn on unrestricted access to private and protected components.

The over 125 math and stat commands are inspired by R’s stats and base package, and include things like rnorm to generate random numbers from the normal distribution, max to get the maximum of a list of numeric objects, or an array of objects, sum to sum a list or array of numbers, etc.

Graphs and tree-plots with BEASTStudio

BEASTShell contains a number of classes for 2D-charting and drawing trees. A few examples are shown below. To create a plot, use the beast.app.shell.Plot class. You can add various series to a plot using the beast.app.shell.Series class. For more information and options available for plots, use the help command

?beast.app.shell.Plot
?beast.app.shell.Series

Also, check out the code used in the chart-demo.

To create plots of trees, check out the classes beast.graphics.RootedTreeDrawing for single rooted trees, beast.graphics.UnrootedTreeDrawing for unrooted trees and beast.graphics.TreeDrawingGrid to draw multiple trees on a single plot.

All of the classes mentioned here will produce output in the plot-pane of BEASTStudio when new objects of them are created.

Learning more

The best way to learn more is to run the demos — read the output on the console while running the demo. Read the documentation and browse the help screen in BEASTStudio.

Next week, we will have a look on how to perform the simulation study we did before, but with much less hassle using BEASTShell.

Scripting languages for BEAST

19 May 2014 by Remco Bouckaert

In a previous blog, we looked at ways in which to program BEAST without having to program in Java, but use some scripting capability and specify formulas and functions in the XML specification. What we saw so far was pretty much limited to writing loggers (and a substitution model). Scripting capabilities are pretty limited, but it would be nice to be able to extend and use BEAST and its libraries without having to delve into Java, and the particular conventions used in BEASTObject classes.

Why scripting: scripting languages are dynamically typed thus simplifying code, and increasing powerful in that few lines of code can produce a lot of functionality, unlike the verbose Java language. When performance is an issue, Java implementations can be considered, noting that most of the time of MCMC calculations in BEAST tend to be in the treelikelihood, which is too complex to script. The BEAST framework can take care of most of tracking which part of the model requires recalculation after an MCMC step, so again this is not something that a script needs to pay too much attention to. Also, scripting provides a fast way to write tests. Furthermore, a script language can be used in an interactive console, which allows exploring various outputs of BEAST analysis.

Scripting is already possible through ScriptEngine implementations in Java, so you can create and access BEAST-objects that way. However, it is quite inconvenient to do so, since setting values of inputs is awkward. A successful scripting implementation should do the following:

  1. Provide a convenient way to create and initialise BEASTObjects, for example, creating a tree from a Newick string.
  2. Provide a convenient way to access Tree structures, e.g. calculate the total length of branches in a clade
  3. Provide a convenient way to implement standard BEASTObjects, such as Distributions, SubstitutionModels, TreeDistributions, and BranchRateModels that can be integrated with BEAST models in an MCMC analysis.
  4. Provide a convenient way to write junit tests.

Syntax

The first choice to make is which syntax to use — roll your own, or use an existing syntax. As much fun as it is to write parsers and design new languages, having Yet Another Scripting Language is not something the world is waiting for. Furthermore, there are plenty languages out there that already are well known and are expressive, well known and convenient enough to do the kind of things we want. Furthermore, there are scripting engines out in the wild that are already optimised for speed — though their default methods of integrating with Java might not suite the way BEAST was designed.

So, the next question becomes: which language should we use? We already disregarded compiled languages such as Java, C++, C, Objective C, C#, etc. There are heaps of interpreted languages out there — Python, Ruby, JavaScript — and statistical scripting languages — WinBUGS, R, stan, matlab, mathematica JAGS.

Popularity: Python, JavaScript, Ruby, Perl, VB and PHP appear currently the most popular according to a number of rankings, including tiobe, lang pop, and code eval blog. Readwrite has 5 ways of measuring language popularity, and again these 6 scripting languages score highest among most rankings. Among the statistical scripting languages, R has gained significant popularity in the last few years. Redmonk, has a interesting plot of languages from stackoverflow and github showing the popularity or R as well as the general purpose scripting languages.

Popularity of scripting languages is important in reducing the average learning curve; a popular syntax is probably familiar for users because they have seen them before.

Standard implementations

The Bean scripting framework and JSR-223 API are the two standard ways of integrating Java and scripting languages. The Bean framework appears dormant (latest release 2011), so let’s have a look at how some of the ScriptEngine implementations do Java-integration.

JavaScript: rhino is a Javascript engine, embedded in J2SE 6. Nashorn is Javascript engine with increased performance, but only available for Java 8. Rhino has a rich set of methods of accessing Java (see documentation). It can access methods of JavaBeans defined as getFoo(), setFoo(), isFoo() as JavaSCript properties. So, let f be a file, then f.name will call f.getName() and f.directory will call f.isDirectory().

Python: Jython a Python interpreter in Java with extensive Java scripting capabilities.

Ruby: JRuby is a Ruby interpreter in Java, also with scripting capabilities.

R: Renjin is an R interpreter in Java written with efficiency in mind, and can be used for scripting with Java. It is aware of JavaBeans, so you can initialise a class like so:

bob <- Customer$new(name='Bob', age=36)

and it will create an object of class Customer using the empty constructor, then call setName("Bob") followed by setAge(36). This comes very close to the way we use initByName method of BEASTObject, which in Java would be

Customer bob = new Customer();
bob.initByName("name","Bob", "age", 36)

So, if the Renjin engine could be coerced into calling the code above, we would have a convenient way to create BEASTObjects.

Fastr is another R interpreter in Java, but it does not implement a ScriptEngine, and also does not support importing of R packages, while renjin does.

Stan, matlab, and mathematica have no interpreters written in Java that I am aware of.

BEASTShell

After spending quite a bit of time looking at the existing languages, an unexpected alternative presented itself: BeanShell, a scripting language with Java syntax, running on the JVM and largely forgotten. However, any Java developer will not have any problems picking up the language — it uses largely the same syntax as Java! Furthermore, it runs on the JVM and has facilities to work with Java libraries. Also, the code is sufficiently clear that it is easy to coerce the language into creating BEASTObjects in a convenient way. The resulting language is called BEASTShell. In the next post, we will have a closer look at the details and see what we can do with BEASTShell.