Tag Archives: epidemiology

The coalescent SIR model in BEAST

18 August 2014 by Remco Bouckaert

Today, we have a look at Volz’s coalescent SIR model, which is an epidemic model that assumes there are three states: susceptible (S), infected (I) and removed (R). To use it, you need to install the phylodynamics package through the package manager. You might want to use coalescent SIR instead of birth-death SIR (BDSIR) — also in the phylodynamics package — since the coalescent model can give smaller confidence intervals.

The coalescent SIR model

The model allows you to estimate the basic reproduction number, R0, which is the average number of people that become infected by each one that is infected. R0 is important, since it mostly determines the size of an epidemic. Other parameters in the model are the transmission rate beta, which is the rate at which new infections arise and the removal rate, gamma. Gamma is often called ‘recovered’ instead of ‘removed’, but since there are other ways to get into state R (by patients dying, or by becoming a sample in our data) the term ‘removed’ is more accurate. Beta and Gamma can be interpreted as birth and death rates, moving from susceptible state S into the infected state I and from the infected state I into removed state R. Further, there is the initial size of susceptible population S0 and time of origin of the infection z0. Obviously, z0 must always be before the root of the tree. At z0, it is assumed there is just a single infected person and throughout the epidemic the population size is constant at size S0+1. For more information, see this paper.

In summary, we have

  • R0, basic reproduction number
  • beta, transmission rate
  • gamma, removal rate
  • S0, initial number of susceptibles
  • z0, time of origin.

Some of these parameters are functionally related through R0=beta*S0/gamma. This means, you can estimate either R0 or beta, and all the others, and then calculate R0 or beta through the formula. There are two implementations: a deterministic and a stochastic one. The stochastic model simulates trajectories of S, I, and R by a jump process while the deterministic model solve a set of ODEs for S, I and R. The stochastic model is much slower — it can be 3 times slower than deterministic — however, stochastic processes more closely represent reality and incorporating them may significantly affect the inference in some cases. For large R0 >1.5 and large S0, deterministic coalescent SIR is both faster and accurate, but for small R0 <1.5 deterministic tends to have the smallest positive bias on R0 and more often have R0 in the 95%HDP (in a simulation study comparing stochastic SIR, deterministic SIR and BDSIR).

Using the model in XML

To use the model, you incorporate serially sampled data by specifying a date trait for sequences.

First, add state nodes to the state element — values need to be adapted for your analysis. Good values help in getting convergence — bad values may result in the chain getting stuck. Makes sure the origin is old enough that it comes before the root of the tree. An error will be thrown when it does not. Here, we estimate R0, and derive beta, so there is no state node for beta.

To set S0 take an estimate of the population size of the area susceptible to the epidemy.
Basic reproductive number R0 must be larger than 1 (otherwise there will be no epidemic) and people disagree on what a high value of R0 is — some say 2.5, but others think 10 is possible.
Gamma removal rate: a high gamma (like in the current ebola outbreak) means fewer overall infected people, high chance of death or quarantine, or recovery with immunity.
z0 must be above root of initial tree, but setting it too high means burn-in may take a long time, so lower values are recommended.


Add priors to element with id=”prior” for S0, R0, gamma and origin — these are just default priors, but if you have more information you should adapt these. For origin z0 – typically not much info available, so uniform with a limited upper bound will do. Note uniform[0,100] means there is 99% that origin is over 1 — check your units of time whether this is relevant for your analysis.


The CSIR model is effectively a tree prior, so we add it to the element with id=”prior”. This adds a deterministic CSIR prior to the analysis (for stochastic CSIR, see below):


Now, we add operators — in this case beta is solved through R0=beta*S0/gamma, so only operators for S0, gamma, R0 and origin are added:





If you have an up-down operator for clock and tree, add origin


If you want to know what the parameters are, add loggers to the trace-log:


Run the analysis in BEAST. There are R scripts (ask Alex Popinga) to get nice plots from these.

The stochastic coalescent SIR model

To add the stochastic CSIR model instead of the deterministic one, the only difference is that the tree prior should be replaced by this:



It is best to set minTraj=1 and minTrajSuccess=1 initially. If there are mixing problems, increase minTraj and minTrajSuccess (minTrajSucces<=minTraj). Increasing numSamplesFromTrajectory will slow things down considerably. 101 tends to be good enough, but in bad mixing increasing it may help, but will slow things down more than increasing minTraj, so try that first.

Coming up

There should be a BEAUti template soon so you can select Coalescent SIR as tree prior and do not worry about the details of the XML.