Tag Archives: BEASTLabs

Load Balancing

18 August 2014 by Remco Bouckaert

If you have a good graphics cards, you can use it with BEAGLE to increase the speed of BEAST runs. If you have multiple graphics cards, when starting BEAST from the command line, the -beagle_order flag can be used to tell which thread goes on which GPU. Start BEAST with the -beagle_info flag to find out what kind of hardware you have and which numbers they are.

For example, I get on my aging univesity computer, the following output:

                  BEAST v2.2.0 Prerelease, 2002-2014
       Bayesian Evolutionary Analysis Sampling Trees
                 Designed and developed by
Remco Bouckaert, Alexei J. Drummond, Andrew Rambaut and Marc A. Suchard
                              
               Department of Computer Science
                   University of Auckland
                  remco@cs.auckland.ac.nz
                  alexei@cs.auckland.ac.nz
                              
             Institute of Evolutionary Biology
                  University of Edinburgh
                     a.rambaut@ed.ac.uk
                              
              David Geffen School of Medicine
           University of California, Los Angeles
                     msuchard@ucla.edu
                              
                Downloads, Help & Resources:
              	http://beast2.cs.auckland.ac.nz
                              
Source code distributed under the GNU Lesser General Public License:
              	http://code.google.com/p/beast2
                              
                     BEAST developers:
	Alex Alekseyenko, Trevor Bedford, Erik Bloomquist, Joseph Heled, 
	Sebastian Hoehna, Denise Kuehnert, Philippe Lemey, Wai Lok Sibon Li, 
	Gerton Lunter, Sidney Markowitz, Vladimir Minin, Michael Defoin Platel, 
          	Oliver Pybus, Chieh-Hsi Wu, Walter Xie
                              
                         Thanks to:
    	Roald Forsberg, Beth Shapiro and Korbinian Strimmer

BEAGLE resources available:
0 : CPU
    Flags: PRECISION_SINGLE PRECISION_DOUBLE COMPUTATION_SYNCH EIGEN_REAL EIGEN_COMPLEX SCALING_MANUAL SCALING_AUTO SCALING_ALWAYS SCALING_DYNAMIC SCALERS_RAW SCALERS_LOG VECTOR_SSE VECTOR_NONE THREADING_NONE PROCESSOR_CPU


1 : GeForce GTX 295
    Global memory (MB): 896
    Clock speed (Ghz): 1.24
    Number of cores: 240
    Flags: PRECISION_SINGLE PRECISION_DOUBLE COMPUTATION_SYNCH EIGEN_REAL EIGEN_COMPLEX SCALING_MANUAL SCALING_AUTO SCALING_ALWAYS SCALING_DYNAMIC SCALERS_RAW SCALERS_LOG VECTOR_NONE THREADING_NONE PROCESSOR_GPU FRAMEWORK_CUDA


2 : GeForce GTX 295
    Global memory (MB): 895
    Clock speed (Ghz): 1.24
    Number of cores: 240
    Flags: PRECISION_SINGLE PRECISION_DOUBLE COMPUTATION_SYNCH EIGEN_REAL EIGEN_COMPLEX SCALING_MANUAL SCALING_AUTO SCALING_ALWAYS SCALING_DYNAMIC SCALERS_RAW SCALERS_LOG VECTOR_NONE THREADING_NONE PROCESSOR_GPU FRAMEWORK_CUDA


3 : GeForce GTX 295
    Global memory (MB): 896
    Clock speed (Ghz): 1.24
    Number of cores: 240
    Flags: PRECISION_SINGLE PRECISION_DOUBLE COMPUTATION_SYNCH EIGEN_REAL EIGEN_COMPLEX SCALING_MANUAL SCALING_AUTO SCALING_ALWAYS SCALING_DYNAMIC SCALERS_RAW SCALERS_LOG VECTOR_NONE THREADING_NONE PROCESSOR_GPU FRAMEWORK_CUDA


4 : GeForce GTX 295
    Global memory (MB): 896
    Clock speed (Ghz): 1.24
    Number of cores: 240
    Flags: PRECISION_SINGLE PRECISION_DOUBLE COMPUTATION_SYNCH EIGEN_REAL EIGEN_COMPLEX SCALING_MANUAL SCALING_AUTO SCALING_ALWAYS SCALING_DYNAMIC SCALERS_RAW SCALERS_LOG VECTOR_NONE THREADING_NONE PROCESSOR_GPU FRAMEWORK_CUDA

It shows there are 4 GPUs numbered 1 to 4 as well as a CPU numbered 0. It also reports some statistics like the number of cores and memory on each GPU. I understand that on OSX due to a bug in the OpenCL library the reported statistics are somewhat exaggerated (e.g. it reports 280 cores on my Macbook air while the HD Graphics 5000 only has 40).

If I have an analysis with two partitions, or one partition, but using the ThreadedTreeLikelihood instead of the TreeLikelihood, and I would start a job with

beast -beagle_order 2,4 -threads 2 beast.xml

it will use two threads, and the first will use GPU number 2 and the second will use GPU number 4. So, you can divide your data among these cards such that all of your GPUs are utilised.

But what if your dataset is too large to fit on one or more of your GPUs? Then, using the -beagle_order flag can be used with some of the partitions using GPUs and others using CPUs. Say, you have a single GPU numbered 1, and a CPU numbered 0, then using

beast -beagle_order 0,0,0,1 -threads 4 beast.xml

will place the first three threads on the CPU and the last on the GPU. Typically, the CPU and GPU do not run at the same speed and I’ll assume the GPU is much faster. What you will see when running the above is then that the three CPU threads are running behind the GPU. So, the GPU thread will be waiting on the CPU threads to finish, and this shows up in CPU load being well below 400% On Mac or Linux I use ‘top’ to see how much CPU load there is. If you could put less data on the CPUs and more on the GPU, you could utilise your computer more efficient.

Now you can — this requires BEAST v2.2.0-pre release, and BEASTlabs for v2.2.0. The ThreadedTreeLikelihood has a proportions attribute where you can specify how much of the data should go into a thread. By default, the ThreadedTreeLikelihood splits up the data in equal parts, which is fine if you only use CPU or only GPU. But when you mix, the proportions specifies proportions of patterns used per thread as space delimited string. This is useful when using a mixture of BEAGLE devices that run at different speeds, e.g GPU and CPU. The string is duplicated if there are more threads than proportions specified. For example, ‘1 2′ as well as ’33 66’ with 2 threads specifies that the first thread gets one third of the patterns and the second two thirds. With 3 threads, it is interpreted as ‘1 2 1’ = 25%, 50%, 25% and with 7 threads it is ‘1 2 1 2 1 2 1’ = 10% 20% 10% 20% 10% 20% 10%. If not specified, all threads get the same proportion of patterns.

By keeping an eye on CPU utilisation, you can see how changing proportions have an impact on CPU load.

Note that mixing the -beagle_GPU and -beagle_SSE flag causes all threads to use CPUs, so the CPU necessarily cannot use the SSE instructions, which means in most practical cases some speed is lost.

Programming BEAST without Java

14 April 2014 by Remco Bouckaert

If you want to log say a simple function of a parameter or set of parameters, programming a logger in Java is often overkill. A much simpler way is to use some of the scripting functionality that in BEAST. There are a number of options;

  • RPNCalculator which takes expressions in reverse polish notation
  • ExpParser in the MASTER? package, you can use which takes arithmetic expressions
  • Script in the BEASTLabs package, you can use complex arithmetic expressions as well as functions
  • RPNCalculator

    RPNCalculator is the simplest and most primitive of the lot. It takes as input a set of Functions and an expression in reverse Polish notation (RPN). RPN is stack based, and takes arguments first, and whenever an operator is pushed on the stack, it uses the top most positions to execute the operator. So “2 3 +” returns 5, “2 2 2 / /” return 0.5 as does “2 2 2 * /”. Variable names are resolved by the IDs of the arguments. Below, a complete XML fragment

     
    	
    	  
    
    

    ExpCalculator

    ExpCalculator can be found in the feast package and allows simple expressions to used as a Function. For example, to calculate the Eucledian distance to the point (20,50), you could use the following:

    20
    50
    
    
    	
    	
    
    

    There is also a ExpCalculatorDistribution that you can use as a distribution where you can specify an expression to represent to log-density of the distribution. For example — ignoring constants — a normal distribtion could be specified like so:

    
    

    which can be used in for example the prior. For more information, see the feast package.

    Script

    To use Script you need to install the BEASTLab package (see the BEAST website for how to install packages).

    With beast.util.Script you can now run complex mathematical expressions like

    3 * sin(a[0]) + log(a[1]) * b 

    where a is a Function with 2 dimensions and b a single values Function. Parameters and Distributions are Functions, so you can use these for your expressions. Since Script is also a Function you can use the result of a Script in another Script.

    Script has an input named x, and every Function and the variable names in the expression must match the ID of the input-value. A complete XML fragment for logging the above expression with two parameters a and b could look something like this:

    1.0 2.0
    3.0
    
    
    	
    	
    
    

    With Script, you can define complex and recursive functions, for example factorial, like so:

    function fac(x) {
            if (x <= 1) {return 1;}
            return x * fac(x-1);
    }
    
    function f(a) {return fac(a);}
    

    Note that if you specify a scripts instead of an expression, the engine always
    calls function f with arguments of the inputs x in order of appearance. The function specification goes as text inside the element, unlike an expression, which goes inside the expression attribute. An XML fragment logging the factorial of parameter a could look something like this:

    5.0
    
    
    	
    
    	function fac(x) {
    		    if (x <= 1) {return 1;}
    		    return x * fac(x-1);
    	}
    
    	function f(a) {return fac(a);}
    
    
    

    Note that because the text is XML, any XML character (especially '<' and '&') need to be escaped. The <= in the above script must replaced by &lt;=. To prevent this, you can wrap the text in a CDATA block, of which the content is not interpreted as XML but taken as is.

    5.0
    
    
    
    

    Script syntax

    By default, the Script uses JavaScript syntax for coding expressions and functions. For expressions, the Math scope is used, so any Math-function is available, so the following functions are available: abs(), acos(), asin(), atan(), atan2(), ceil(), cos(), exp(), floor(), log(), max(), min(), pow(), random(), round(), sin(), sqrt(), tan(). A disadvantage is that debugging is awkward to put it mildly. If it does not compile, the ScriptEngine does not offer much clues to where things go wrong.

    You can specify other script engines by setting the engine attribute to one of JavaScript, python, jruby, or groovy and provided the script engine is available in your class path, you can use other syntax for specifying a script (though not for expressions). For example, to use python, you need to include jython in the class path, and set engine='python' on the Script element. A factorial logger could be done like this in python:

    
    
    

    So far, these scripts are rather simple, and can effectively only useable for logging of advanced information. In a later blog, we will look how to use scripts in other situations. But for now happy scripting!