All posts by rbou019

What is new in v2.4.8

BEAST v2.4.8 is a patch release of issue #736. When using RNA data, a bug caused the BeagleTreeLikelihood class to incorrectly interpret ‘U’ characters in the sequence as missing data instead of the ‘T’ character similar to DNA. Thus, the likelihood is calculated incorrectly when using the BEAGLE library — this is not a bug in BEAGLE, but in the BeagleTreeLikelihood class in BEAST that interfaces with BEAGLE, so it does not affect other software that uses BEAGLE, such as BEAST 1 and MrBayes.

If you used BEAST with RNA data, have BEAGLE installed and used BEAGLE in the analysis, this affects your analysis.

This does not affect DNA or amino acid data with or without use of BEAGLE.

Also, this does not affect analyses with RNA data when using the -java option for BEAST, or when you don’t have BEAGLE installed (BEAST attempts to use BEAGLE if installed by default).

To tell whether you are using the BeagleTreeLikelihood or the java TreeLikelihood, BEAST shows at the start which TreeLikelihood is used. If you have BEAGLE installed and use it, it should show a message similar to this:

Using BEAGLE version: 2.1.2 resource 0: CPU
with instance flags: PRECISION_DOUBLE COMPUTATION_SYNCH EIGEN_REAL SCALING_MANUAL SCALERS_RAW VECTOR_SSE THREADING_NONE PROCESSOR_CPU FRAMEWORK_CPU
Ignoring ambiguities in tree likelihood.
Ignoring character uncertainty in tree likelihood.
With 69 unique site patterns.
Using rescaling scheme : dynamic

If you use the -java option, or do not have BEAGLE installed, it shows

TreeLikelihood(treeLikelihood) uses BeerLikelihoodCore4

What is new in v2.4.7

This release fixes a large number of small issues, outlined below.

BEAUti

Relabel two buttons because they were hard to find: “Guess” is now “Auto configure” in tip dates panel. “+” button on priors panel is now “+ Add Prior”.

More sensible default date value for taxa without date specified in tip dates panel. The old default put ‘zero’ as the default date, but for example when all data is from viruses sampled in the last 20 year, that default does not make sense, so it now becomes the data of the most recent sample.

Make sure the appropriate tree is used in MRCAprior when there are multiple trees in the analysis.

When there are many taxa with tip dates, it is convenient to specify these dates in the NEXUS file. The current version makes sure tips with ‘fixed’ distributions imported from Nexus do not get estimated.

Robustify NEXUS parser, an ever continuing task.

Taxa could be duplicated in taxon list in NexusParser, which could lead to numbered taxa. The duplication is prevented now.

Prevent accidentally cloning of up-down-all operator when using the StarBeast template.

BEAST

Default locale set to English so full stops are used in NEXUS output.

Warn if Yule (or BD) conditions on root, but no root MRCAPrior is set.

Robustify resume

Suppress “Overwrite (Y/N)?” message when BEAST runs in console.

Stop chain when encountering a +infinity posterior.

Check that taxon set is specified when using RandomTree.

Normalise stateNodes so XML characters (‘”&<>) get escaped properly when writing state files.

Appstore

Improved formatting of app list.

TreeAnnotator

Now adds common ancestor height estimates as attributes.

API updates

Something for programmers to keep in mind are the following changes:

Changed access levels of a few methods in nexus parser so you can derive your own from the NexusParser class.

Add Tree scale and ScaleOperator test.

Add new Tree constructor from root node, which is convenient when you have a tree represented by Nodes only.

Add support for input/output of non-binary trees.

Add Input.set() method, which is useful when setting input values as alternative to setValue, which requires two arguments: the value to set and the object containing the input. The latter is necessary for determining the type of the Input if that was not already determined. When

BeautiAlignmentProvider getAlignments method added to facilitate scripting (Beasy).

TreeParser correctly parses tree edge length metadata, and improved error reporting.

What is new in v.2.4.6

There are mainly a few enhancements and bug fixes in BEAUti.

BEAUti

Starting trees can now be edited by showing the starting tree panel, which becomes visible by selecting the menu View/Show Starting tree panel. This is for the Standard template only, and does not work in the StarBeast template. It should be much less error prone to set up a different starting tree than editing the XML. Also, it is now possible to change the attributes of the random tree, like population size and maximum tree height, which makes it easier to get a starting tree that conforms to all constraints of the analysis, such as origin heights for advanced birth death tree priors.
There is a choice of random tree, which used to be the default, a cluster tree (UPGMA, neighbor joining, and a number of other standard hierarchical clustering algorithms) and a newick tree.

BEAUti now allow alignments to be replaced, so old analyses can be used for new data. If you need to run the same kind of analyses for many alignments this can save quite a bit of time. To replace an alignment, select an alignment in the partition panel, click on the small ‘r’ button at the bottom of the screen, next to the ‘+’, ‘-‘ and ‘Split’ buttons. A file chooser dialog is shows where you can select an alignment file that will replace the one selected in the partition panel.

There is a fix for a fasta file import bug that marked sequences as amino acid while it should be marked as nucleotide. This happened when importing a fasta file that was misclassified as amino acid alignment, and a dialog was shown where you can change the type. Unfortunately, only the data type for the alignment was changed but not the sequences, leading to hard to diagnose problems.

When splitting alignment on codon positions, previously the tree was unlinked. So splitting into three partitions at codon positions 1, 2 and 3 resulted in adding three trees. Now, BEAUti keeps the trees linked, which makes more sense from a biological point of view.

In the Site model panel, BEAUti now automatically set the estimate flag on the shape parameter when choosing more than 1 rate categories. You can still fix the shape parameter by un-checking the checkbox again, but since this is not usual, the shape is now estimated by default.

BEAUti allows visualisation of alignments, which is triggered by double clicking an alignment in the partition panel. Display of integer alignments as used in microsattelite analyses is now possible.

BEAST

Better documentation by updates of descriptions of classes and improved error messages.

More robust XMLParser, which can now deal more robustly with BEASTObject classes using the Param annotation in constructors.

Bug fix that prevents double counting of offset-input in ParametricDistribution.sample.

Other

DensiTree version updated to v2.2.6.

What is new in v2.4.5

Replicability support

The biggest addition is that it is now possible to run an XML file with BEAST that uses
exactly the package versions that were used to create the XML in BEAUti. This means that
changes in the XML due to changes in package versions are no longer a problem, and you can
run an analysis with exactly those package versions of the original analysis.

The way it works is that BEAUti adds a “required” attribute on the beast-element in the
XML containing packages and their versions used to set up the analysis. Of course, you can
edit the XML by hand and change version numbers and add packages if you like. BEAST now has
a “-strictversions” flag, so when you start BEAST that option it only loads packages and
versions as specified in the “required” attribute.

Of course, the versions of these packages must be installed for BEAST to be able to load
them. Therefore, the package manager in BEAUti now allows specifying specific versions of
the package to install, and multiple package version can be installed side by side. By
default, the latest version of the package that is installed will be loaded, unless the
“-strictversions” flag is set. The addonmanager utility has a -version flag for specifying
the package version to install, if you prefer installing packages from the command line.

BEAUti

Previously, it was possible to edit priors, but these editing actions could interere with
cloning substitution and clock models. A

When importing FASTA files, previously a single character other then A, C, G or T meant that
the alignment was classified as amino acid, even if it is a nucleotide alignment. This
version counts the number of non-A,C,G,T characters and makes a better guess based on that
number wrt total number of characters in the alignment. Furthermore, a dialog pops up
where you can change that guess if it was incorrect. If many alignments of the same datatype
are loaded at the same time, you can choose to mark al as the same type so you don’t have
to close down the dialog for every of these alignments.

MRCAPriors imported through NEXUS files (using a calibrate entry) were not logged in the
trace log, but now they are.

BEAST

There are a few tree parser fixes

StarBeastStartState now takes bounds of parameters it sets in account so if you specify
a bounds on birth rate or any of the population sizes the initial state will not violate
them. Previously, these bounds were ignored resulting in the analysis not being able to
be started.

Operator schedules can now be nested. This means that you can specify a portion of the
operator weights to

Improved error reporting (as usual).

Package manager

Added -version flag to specify exactly which package version to install.

TreeAnnotator

Now calculates 2D HPD intervals by default (for phylogeography analyses). Spread3 requires the
uncertain intervals to be available, but they were not by default, resulting in confusion
by several users.

Added -nohpd2D flag to suppress 2D HPD interval calculation, since any 2-dimensional continuous
trait that is logged on the tree will now by default gets a 2D HPD interval calculated. However,
if the interval is not contiguous, TreeAnnotator produces a warning message, which may not
be appropriate for any but geographical regions. This flag helps suppress the messages and
reduces calculation time.

Added -noSA flag to suppress tree set being seen as that of a sampled ancestor analysis.
It can happen that a tree set contains a branch of length zero, which is interpreted by
TreeAnnotator as a sampled ancestor tree. Setting this flag prevents this interpretation.

What is new in v2.4.4

Smooth out some issues with importing Nexus files in BEAUti. The NEXUS file can contain information about calibrations on clades and tip ages, which is more convenient when there are many calibrations or dated tips. In the previous release, BEAUti displayed any distribution as being ‘Uniform’, even when other distributions were specified in the NEXUS file, and could not be changed in BEAUti, which is fixed now. In some circumstances, when tip dates were specified a major problem occurred preventing BEAUti to set up connections, which manifested itself in missing priors and other components, which is fixed in this release.

A TreeAnnotator fix was made so user defined trees to can be annotated instead of using an MCC tree based on the tree file.

Allow smaller log files by logging fewer significant digits of metadata. The TreeWithMetaDataLogger has a “dp” flag that can be used to specify the number of decimal places to use writing branch lengths, rates and now also real-valued metadata. When logging large trees this can reduce the size of the file considerably.

Fix that prevented starting any BEAST application on Mac Sierra. The OSX version was released on a computer that did not run Sierra, and it appears that prevented any BEAST application to open under the new security settings in Sierra, though it did not prevent them to run on any older version of OSX.

Metropolis Coupled MCMC(MC3) works?

19 May 2015 by Remco Bouckaert

Metropolis coupled MCMC (MCMCMC or MC3) allows running an MCMC analysis together with a number of ‘heated’ chains. These heated chains run over a distribution that is adjusted so that it is less peaked than the posterior we want to sample from, which means it is easier for these heated chains to move away from a local optimum. At regular intervals there is the option to switch states between chains (depending on a stochastic critereon), including the chain that samples from the posterior. This is supposed to help explore the sample space more efficient.

To set up an MCMCMC analysis in BEAST, you need to install the BEASTLabs package. The easiest way to set up the XML is by setting it up in BEAUti for a simple MCMC analysis, save the file and edit the XML by

  • replacing the spec attribute in the run element by "beast.inference.MCMCMC".
  • add a chains attribute with the number of chains you want to run.

After this, the XML should look something like this:

  

When running the analysis, you want to use at least as many chains as there are cores, so that each chain thread can run on its own core. The current implementation is multi-threaded, but does not support multi-processors (yet).

Does MCMCMC work?

The question remains whether it is better to run say 4 individual MCMC analyses and combine results instead of running a single MCMCMC analysis. From what I have seen so far, the BEAST proposals are typically very well tuned to explore tree space, and can handle correlations between various parameters quite well. If a BEAST analysis gets stuck — which shows up by running different chains that seem to converge, but all end up at a different posterior — anecdotal evidence with *BEAST analyses suggest that throwing MCMCMC at it does not solve the problem.

So, there are two criteria on how to judge whether MC3 works or not

  • Can it get us out of local optima, where MCMC by itself has trouble?
  • Can it produce better effective sample size (ESS) per computer cycle?

I can imagine that MC3 works in some cases, and it has been around for ages (notably in MrBayes), but perhaps this is due to the kind of MCMC proposals used, and maybe BEAST analyses do not benefit from MC3. I have not seen an example yet, so if you have a BEAST analysis where MC3 produces better results than MCMC alone, please let me know!

What is new in v2.4.3

BEAUti

If you want to sample tip dates, you can create an MRCA prior in the priors panel (by clicking the little ‘+’ button at the bottom of the screen). Once you specified the set of taxa and an age distribution, click the ‘tips only’ checkbox, and a tip sampling operator will be automatically added.

Multi-monophyletic constraints through Newick

BEAUti now allows packages to specify add package specific priors; when you click the ‘+’ button at the bottom of the priors tab, and a package (such as BEASTLabs that can add a multi-monophyletic prior) provides a new prior a dialog pops up showing a list to choose from. By default, and MRCA prior is added if no other package provides anything. The multi-monophyletic prior from BEASTLabs allows you to specify a large number of monophyletic constraints through a tree in Newick format.

Microsattelite support

Another way packages now can extend BEAUti is by catering for package specific file formats. For example, the BEASTvntr package reads in alignments from a comma separated file format and interprets them as numbers of tandem repeats. The BEASTvntr provides microsattelite support.

Misc

Gamma distribution now allows multiple parameterisations: shape/scale, shape/rate, shape/mean, and one parameter, but defaults to the shape/scale as in previous version.

When saving a file to XML from BEAUti, all packages used in the XML are now encoded in XML, so when starting BEAST on a different computer, it can provide better error reporting of missing packages.

Better looking on high-res screens.

BEAST

Allow multiple citation annotations per class.

Allow trait sets with unspecified dates instead of failing when not all taxa had a date specified.

Allow multiple arguments to Sum so you can add values from various sources.

Improved error reporting, as usual.

Package Manager

The GUI version of the package manager now has links to documentation. By clicking the link of a particular package, a web browser opens that should bring you a page with package specific information.

Some work has been done to make the layout of the package manager look better.

Misc

TreeAnnotator fix for phylogeography in low-mem mode — previously, any meta data in array format such as location information was ignored.

LogCombiner suppresses duplicate ‘=’ in tree output.

What is new in v2.4.2

BEAUti has a menu — View/Zoom In and View/Zoom out — which causes everything to scale up or down respectively. Once a particular zoom level is set in BEAUti, all other applications with a graphical user interface, like BEAST, TreeAnnotator, LogCombiner, etc. scale up to the same level. Also, by default scaling is such that they should look acceptable on high resolution screens.

Both BEAUti and BEAST have some improved error reporting.

One annoying bug was that BEAST closed its console window on XML parsing errors, making it impossible to read what was wrong with the XML file. This bug is solved now.

LogCombiner used to read in all log and tree files before writing them to the combined files. The new implementation processes input files line by line and directly write them to the combined log, so it requires much less memory than before.

Densitree updated to version 2.2.5, which supports export of DensiTree in SVG vector format.

What is new in v2.4.1

BEAUti

BEAUti now allows imports of calibrations from NEXUS files, so you can specify tip dates, distributions on tip dates, monophyletic constraints and clade calibrations in a NEXUS file. This is especially handy when there are a large number of calibrations or when a large number of clades need to be defined.

BEAUti now has a “File/Launch Apps” menu to start applications provided by packages, such as the GUI for doing a Path Sampling analysis (as the AppStore does).

In Windows and Linux, the *BEAST template went missing at the second time BEAUti was started due to a bug in the way packages are handled. This is fixed now.

Streamlined upgrades of BEAST so when you can upgrade BEAST as simple as upgrading any package. When upgrading BEAST, BEAUti exits and when restarting it downloads the latest version — which may take a little time.

BEAST

On OSX, a common problem was that a CUDA driver was installed to support BEAGLE, but that there is no hardware that is CUDA enabled. The result was a crash of BEAST without an error message, which made it hard to find out what went wrong. In this version a test is done for this condition, and if it exists, instructions are provided on how to uninstall CUDA drivers, which should fix the problem.

The CLI script for BEAST should have less trouble loading the BEAGLE library in Linux and OSX.

Two operators have improved operator tuning resulting in slightly better performance (higher ESSs) in most cases.

There are some improvements in reporting error conditions, which should help diagnose problems.

LogAnalyser

A bug crept into v2.4.0 causing LogAnalyser not to show progress on loading and processing the log file when started from CLI, which is fixed now.

BEAST 1 vs 2 performance benchmarking

March 2016 by Remco Bouckaert, Tim Vaughan, Walter Xie, and Alexei Drummond

Recently, a few users reported problems with BEAST 2 performance, concluding it was worse than BEAST 1. This puzzled us, because BEAST 1 and 2 share the same core algorithms, and both spend most of their time doing phylogenetic likelihood calculations, which is optimised using BEAGLE, a library shared by both programs. In fact, recently we changed the way that BEAST 2 handles proportion invariant categories, saving some phylogenetic likelihood calculations, so in theory it should be faster when using a proportion of invariant sites in the model. So, we became curious whether there are real performance differences between BEAST 1 and 2 and decided to do a benchmark. We expected them to perform roughly the same on GTR and GRT+G analyses, and BEAST 2 to do better on GTR+I and GTR+G+I analyses.

The picture below summarises the speed of BEAST 2 over BEAST 1 using 1, 2, 4 thread(s) in the 3 different operation systems. As you can see the performance is very similar for GTR and GTR+G, with BEAST2 being perhaps slight faster (although this could be due to debugging that BEAST1 performs at the start of the chain):

 

What we did

Analyses

BEAST can do many kinds of analyses, but for the purpose of this benchmark, we want to see whether the TreeLikelihood calculations, which typically dominate the computational time of MCMC runs, are comparable. To see the impact of the way BEAST 2 handles proportion invariant, we want to have an analysis with and without a proportion invariant category. And since many analyses use gamma rate heterogeneity with and without proportion invariants, we end up with four variants:

  • GTR
  • GTR + 4 gamma categories
  • GTR + proportion invariant
  • GTR + 4 gamma categories + proportion invariant

To keep things otherwise simple, we use a Yule tree prior, a strict clock and start with a random tree. To be practical, we set up the analysis in BEAUti 1 and 2, just importing an alignment, choosing the site model, setting the tree prior in BEAST 1 (BEAST 2 uses Yule by default) and save to file. As it turns out, the analyses produced that way are almost the same, but there are some small differences in the operator settings. Due to auto-optimisation, they will eventually become almost the same, but to make the two analyses as equal as possible we edited the XML so that they have the same operator weights and tuning values. Also, the population size used to generate the random starting tree differed so these were made the same as well.

The MCMC runs were run for 1 million steps in order to make them long enough that the slightly different ways extra likelihood calculations are done at the start for debugging purposes has little effect on the outcome. Also, with longer runs JIT compiler differences are eliminated. We took care to run the different programs under the same circumstances, on a computer not doing any other jobs at the time.

This whole process was automated to deal with the various data sets we wanted to test.

Threading

The way to set up threads in BEAST 2 is a bit cumbersome (v2.4.0 improves things a lot), so perhaps the reason is different configurations of threading. Therefore, we want to see what the impact of threading is. That led us to 3 variants:

  • 1 thread BEAGLE SSE
  • 2 thread BEAGLE SSE
  • 4 thread BEAGLE SSE

For BEAST 1, we used the flags -overwrite -beagle_instances. For BEAST 2 we used -overwrite -threads for the SSE runs. For all cases, we verified that both programs use the same settings of BEAGLE as reported at the start of the run.

Data sets

To get an impression of the impact of different data, we randomly selected a number of data sets from treebase.org with a number of sizes. We also used the data sets from the BEAST 1 examples benchmark directory giving a total of 15 data sets.

dataset taxa sites patterns
. . . .
M1044 50 1133 493
M1366 41 1137 769
M1510 36 1812 1020
M1748 67 955 336
M1749 74 2253 1673
M1809 59 1824 1037
M336 27 1949 934
M3475 50 378 256
M501 29 2520 1253
M520 67 1098 534
M755 64 1008 407
M767 71 1082 446
benchmark1 1441 98 593
benchmark2 62 10869 5565
old_benchmark 17 1485 138

Versions

To have a fair comparison, we used the latest versions currently avaiable v1.8.3 and v2.4.0.

Results


The images below show the run time for 1, 2, 4 thread(s) in Linux, where 1.8.3(t0) presents no threading pool for single thread in BEAST 1.8.3.

  • 1 thread:
  • 2 threads:
  • 4 threads:

With increasing number of threads, the difference in run time in seconds decreases, but BEAST 2 is almost always slightly faster than BEAST 1 in these comparisons. However, it turned out that the data sets are too small for four threads to be of much use — the four threaded runs tended to be slower than for two threads, which is optimal for most of these datasets for both BEAST versions. This may also be a function of the hardware used.

Cursory checks of ESSs for BEAST 1 and 2 in Tracer did not show any substantial difference, which is not surprising since the same mixture of operators was used. Also, parameter estimates tended to agree between some randomly selected analyses.

To make sure that it differences are not OS dependent, we ran the analyses on Windows 7, OS X and Linux, but did not find any substantial differences between the operating systems.

Conclusions

To our surprise, we found that BEAST 2 is slightly faster than BEAST 1. This is not what we expected since both programs perform the same analysis using the same BEAGLE library. Although we did our best to compare apples with apples, it is possible we overlooked something, so let us know if you find anything that can explain the differences in performance.

If you want to replicate these runs, you can find them in the benchmark repository on https://github.com/CompEvol/benchmark, which includes the data, some instructions and scripts to run them.