19 May 2014 by Remco Bouckaert
In a previous blog, we looked at ways in which to program BEAST without having to program in Java, but use some scripting capability and specify formulas and functions in the XML specification. What we saw so far was pretty much limited to writing loggers (and a substitution model). Scripting capabilities are pretty limited, but it would be nice to be able to extend and use BEAST and its libraries without having to delve into Java, and the particular conventions used in BEASTObject classes.
Why scripting: scripting languages are dynamically typed thus simplifying code, and increasing powerful in that few lines of code can produce a lot of functionality, unlike the verbose Java language. When performance is an issue, Java implementations can be considered, noting that most of the time of MCMC calculations in BEAST tend to be in the treelikelihood, which is too complex to script. The BEAST framework can take care of most of tracking which part of the model requires recalculation after an MCMC step, so again this is not something that a script needs to pay too much attention to. Also, scripting provides a fast way to write tests. Furthermore, a script language can be used in an interactive console, which allows exploring various outputs of BEAST analysis.
Scripting is already possible through ScriptEngine implementations in Java, so you can create and access BEAST-objects that way. However, it is quite inconvenient to do so, since setting values of inputs is awkward. A successful scripting implementation should do the following:
- Provide a convenient way to create and initialise BEASTObjects, for example, creating a tree from a Newick string.
- Provide a convenient way to access Tree structures, e.g. calculate the total length of branches in a clade
- Provide a convenient way to implement standard BEASTObjects, such as Distributions, SubstitutionModels, TreeDistributions, and BranchRateModels that can be integrated with BEAST models in an MCMC analysis.
- Provide a convenient way to write junit tests.
The first choice to make is which syntax to use — roll your own, or use an existing syntax. As much fun as it is to write parsers and design new languages, having Yet Another Scripting Language is not something the world is waiting for. Furthermore, there are plenty languages out there that already are well known and are expressive, well known and convenient enough to do the kind of things we want. Furthermore, there are scripting engines out in the wild that are already optimised for speed — though their default methods of integrating with Java might not suite the way BEAST was designed.
Popularity of scripting languages is important in reducing the average learning curve; a popular syntax is probably familiar for users because they have seen them before.
The Bean scripting framework and JSR-223 API are the two standard ways of integrating Java and scripting languages. The Bean framework appears dormant (latest release 2011), so let’s have a look at how some of the ScriptEngine implementations do Java-integration.
Python: Jython a Python interpreter in Java with extensive Java scripting capabilities.
Ruby: JRuby is a Ruby interpreter in Java, also with scripting capabilities.
R: Renjin is an R interpreter in Java written with efficiency in mind, and can be used for scripting with Java. It is aware of JavaBeans, so you can initialise a class like so:
bob <- Customer$new(name='Bob', age=36)
and it will create an object of class Customer using the empty constructor, then call setName("Bob") followed by setAge(36). This comes very close to the way we use initByName method of BEASTObject, which in Java would be
Customer bob = new Customer(); bob.initByName("name","Bob", "age", 36)
So, if the Renjin engine could be coerced into calling the code above, we would have a convenient way to create BEASTObjects.
Fastr is another R interpreter in Java, but it does not implement a ScriptEngine, and also does not support importing of R packages, while renjin does.
Stan, matlab, and mathematica have no interpreters written in Java that I am aware of.
After spending quite a bit of time looking at the existing languages, an unexpected alternative presented itself: BeanShell, a scripting language with Java syntax, running on the JVM and largely forgotten. However, any Java developer will not have any problems picking up the language — it uses largely the same syntax as Java! Furthermore, it runs on the JVM and has facilities to work with Java libraries. Also, the code is sufficiently clear that it is easy to coerce the language into creating BEASTObjects in a convenient way. The resulting language is called BEASTShell. In the next post, we will have a closer look at the details and see what we can do with BEASTShell.