Skip to content

Tutorial 2

Tim Vaughan edited this page Apr 4, 2014 · 31 revisions

In this second tutorial we explore some of the more advanced features of the MASTER stochastic population dynamics simulation engine. Specifically, we demonstrate how to

  1. simulate under a model composed of an internally structured populations,
  2. alter the stochastic simulation algorithm used,
  3. estimate moments using many independent stochastic realizations, and
  4. set up state-dependent simulation end conditions.

To explore these features, we consider a model involving a number of distinct demes each of which is subject to a stochastic logistic model similar to that used in tutorial 1:

There are two clear differences to the previous model. Firstly, all reactants and products carry a subscript indicating that they represent individuals belonging to the deme "i". (The rates also carry this subscript allowing for variation of these rates between demes.) Secondly, we have included a linear decay reaction describing the non-competitive spontaneous death of individuals within the deme.

In addition to the within-deme reactions, the model also allows for the movement of individuals between demes i and j via the following stochastic migration process:

for all . The notation is used to indicate that the inter-deme migration rates are in general non-uniform and in particular that each direction may have a distinct rate.

For brevity, we focus here on the special case of 2 demes. However, the presented XML code should be easily generalizable to more complicated scenarios.

Input file specification

As in the previous tutorial, we begin by specifying the overall type of calculation we would like MASTER to perform. In this case, we intend to estimate moments of the population sizes by averaging functions of these sizes over many independent stochastic simulations. MASTER provides a specific calculation type for this purpose: the EnsembleSummary. It is specified using the following <run> element, which is contained within the <beast> element exactly as before.

<beast version='2.0' namespace='master:master.model:master.steppers:master.conditions:master.postprocessors:master.outputs'>    
  <run spec='EnsembleSummary'
       simulationTime='10'
       nSamples='1001'
       nTraj='5000'>

  </run>
</beast>

When specifying an EnsembleSummary, the <run> element must be given two additional attributes in addition to simulationTime. The first, nSamples is used to specify the number of evenly spaced time points at which the population sizes will be recorded and moments will be estimated. The second, nTraj, specifies the number of statistically independent simulations to perform.

Model

Translating the reaction scheme above into a model element for inclusion in the <run> element above is slightly different to that presented in tutorial 1 due to the population structure. The first difference is that rather than specifying the constituent populations individually using population elements, we define a population type:

    <model spec='Model' id='model'>
      <populationType spec='PopulationType' typeName='X' dim='2' id='X'/>

    </model>

In place of the populationName attribute this has a typeName attribute. Additionally, it has a dim attribute which specifies the number and arrangement of distinct populations within this type. In this case we are interested only in 2 demes, so we set the value of this attribute accordingly.

We also use a slightly different means of specifying reactions. Rather than individual reaction elements, we group similar reactions together using reactionGroup elements. For example, we use the following to specify the two possible migration reactions:

<reactionGroup spec='ReactionGroup' reactionGroupName="Migration">	
  <reaction spec='Reaction' rate="0.1">
    X[0] -> X[1]
  </reaction>
  <reaction spec='Reaction' rate="0.2">
    X[1] -> X[0]
  </reaction>
</reactionGroup>

Each reaction group contains one or more reaction elements. Note that it is not necessary to give each of these elements a reactionName attribute, as the group is named as a whole. Each reaction element contains a single reaction, specified just as in tutorial 1. A minor difference in this case, however, is that each reactant and product must contain both a population type name and a location/deme index. As we have specified that X contains only two demes, valid indices are 0 and 1.

Our model specification is complete once this reactionGroup element and those specifying the birth and death processes described above are added to the model element.

Initial state

Likewise, the initial state specification is also subtly different than in the previous tutorial. In that case, the population whose size is to be set to a non-zero could be specified using a population attribute and embedding a reference to the population element in the model specification. In this case, we must instead use a new population element so that the specific location of that population within the type can be specified.

<initialState spec='InitState'>
  <populationSize spec='PopulationSize' size='1'>
    <population spec='Population' type='@X' location="0"/>
  </populationSize>
</initialState>

Note that here we have specified that the initial state of the system include a single individual in deme 1, and an empty deme 2.

End condition

The initial state chosen above is likely to frequently lead to a premature extinction of the population via the spontaneous (linear) death process. In other words, many of the simulated population size histories generated under this model are likely to immediately decay to zero. While this is a fact of life and may be something that one wishes to include in the simulated ensemble, it is often desirable to remove these or other kinds of trajectories from the simulated trajectory pool by conditioning on whether or not the simulated system reaches a particular state. In our case, we wish to condition against such premature extinction by specifying a trajectory end condition using the following element:

<populationEndCondition spec='PopulationEndCondition'
		        threshold="0"
			exceedCondition="false"
			isRejection="true">
  <population spec='Population' type='@X' location='0'/>
  <population spec='Population' type='@X' location='1'/>
</populationEndCondition>

Here we have stated that the calculation of a given trajectory must be stopped if ever the sum of the sizes of the populations specified by the included <population> elements is found to be less than or equal to 0. (Setting the exceedCondition attribute to "true" causes the direction of the inequality to be reversed.) Exactly what occurs when the condition is met is specified by the isRejection element which, if set to true as is the case here, causes the partially completed trajectory to be completely discarded. This is the only option for EnsembleSummary calculations. For Trajectory and Ensemble calculations, however, partially complete trajectories can be included in the output by setting isRejection="false".

Moments

We come now to what might be considered the crux of this tutorial. Moment estimation is an important use of stochastic numerical approaches, as it is often only the means and variances of population sizes that give any indication of how well a stochastic model agrees with reality. The central idea of this kind of calculation is that for any reasonable function of the population size(s) we can write

where is the population size at time t obtained from stochastic realization l. Of course, this equation is only exact in the infinite trajectory limit. However, the utility of the Monte Carlo approach lies in the fact in many cases of interest the difference between an estimate computed using a finite number of trajectories and the true value decreases monotonically as the number of trajectories used increases. Thus, we can obtain numerical results to any required precision using only a finite number of stochastic realizations.

The most obvious moments to estimate are the expected deme population sizes. We specify that these moments should be estimated in our calculation by including the following element somewhere within the <run> element.

<momentGroup spec='MomentGroup' momentGroupName='X'>
  <moment spec='Moment'>
    <factor spec='Population' type="@X" location="0"/>
  </moment>
  <moment spec='Moment'>
    <factor spec='Population' type="@X" location="1"/>
  </moment>
</momentGroup>

The momentGroupName attribute is important, as it is used to name the results of the estimation in the output file. Each <moment> element includes one or more <factor> elements which are tied to individual populations. The sizes of all populations specified within a <moment> element are multiplied together before being averaged across trajectories. Here we have two individual moments specified, each composed of only a single population size factor.

Additionally, we may be interested in statistical correlations between the population sizes in different demes. In order to calculate this, we need to record an estimate of the moment . We can do this by including the following <moment> element on its own following the previous <momentGroup> element:

<moment spec='Moment' momentName='X1X2'>
  <factor spec='Population' type="@X" location="0"/>
  <factor spec='Population' type="@X" location="1"/>
</moment>

Note that here we must specify the attribute momentName, and that the element includes two <factor> elements specifying each population size component of the desired product.

Finally, we can keep track of the total population size by using another <momentGroup> element:

<momentGroup spec='MomentGroup' momentGroupName='N' sum="true">
  <moment spec='Moment'>
    <factor spec='Population' type="@X" location="0"/>
  </moment>
  <moment spec='Moment'>
    <factor spec='Population' type="@X" location="1"/>
  </moment>
</momentGroup>

This essentially identical in form to the element specifying the mean individual population size moments, the only difference being the inclusion of the sum="true" attribute which causes MASTER to add together the individual moment estimates specified by the <moment> elements before recording the result in the output file.

Stochastic algorithm specification

Up until this point, we have avoided explicit consideration of exactly how MASTER is generating the simulated population dynamics. In actual fact, however, there are many published algorithms for doing this, each of which have their own benefits and drawbacks. By default, MASTER uses the stochastic simulation algorithm usually credited to Daniel Gillespie (J. Comp. Phys., 1976). This algorithm, which can also be explicitly specified using the appropriate stepper, is guaranteed to produce exact draws from the true probability distribution over the stochastic trajectories.

Alternatively, MASTER allows us to select a finite time-step stochastic integration algorithm. The algorithms currently supported are fixed time-step variants of (a) the tau-leaping algorithm (as introduced by Gillespie, J. Comp. Phys., 2001), and (b) the step anticipation tau-leaping (SAL) algorithm (due to Sehl. et al., J. Comp. Biol., 2009). Although these algorithm are only asymptotically exact in the limit of small time step, they allow us to trade a small amount of accuracy for a large reduction in computational complexity. Specifically, for a given time step, the complexity of this algorithm does not explicitly depend on the size of the populations involved. Thus, such algorithms are sensible choices whenever very large populations are involved.

We tell MASTER to use the tau-leaping algorithm by including the following element anywhere inside the <run> element:

<stepper spec='TauLeapingStepper' stepSize="0.01"/>

The stepSize attribute determines the time increment that will be used in the simulation. Note that this is specified completely independently of the nSamples attribute in the ensemble summary <run> element, which only specifies the number of time points at which moments are estimated.

Complete input file

Combining all of these elements together (including the additional reaction group and output elements omitted for brevity above) yields the following MASTER simulation specification:

<beast version='2.0' namespace='master:master.model:master.steppers:master.conditions:master.postprocessors:master.outputs'>
    
  <run spec='EnsembleSummary'
       simulationTime='10'
       nSamples='1001'
       nTraj='5000'>
    
    <stepper spec='TauLeapingStepper' stepSize="0.01"/>

    <model spec='Model' id='model'>
      <populationType spec='PopulationType' typeName='X' dim='2' id='X'/>

      <reactionGroup spec='ReactionGroup' reactionGroupName='Birth'>
	<reaction spec='Reaction' rate="2.0">
	  X[0] -> 2X[0]
	</reaction>
	<reaction spec='Reaction' rate="2.0">
	  X[1] -> 2X[1]
	</reaction>
      </reactionGroup>

      <reactionGroup spec='ReactionGroup' reactionGroupName='Competition'>
	<reaction spec='Reaction' rate="0.001">
	  2X[0] -> X[0]
	</reaction>
	<reaction spec='Reaction' rate="0.001">
	  2X[1] -> X[1]
	</reaction>
      </reactionGroup>

      <reactionGroup spec='ReactionGroup' reactionGroupName='Death'>
	<reaction spec='Reaction' rate="0.2">
	  X[0] -> 0
	</reaction>
	<reaction spec='Reaction' rate="0.2">
	  X[1] -> 0
	</reaction>
      </reactionGroup>

      <reactionGroup spec='ReactionGroup' reactionGroupName="Migration">	
	<reaction spec='Reaction' rate="0.1">
	  X[0] -> X[1]
	</reaction>
	<reaction spec='Reaction' rate="0.2">
	  X[1] -> X[0]
	</reaction>
      </reactionGroup>
    </model>
    
    <initialState spec='InitState'>
      <populationSize spec='PopulationSize' size='1'>
	<population spec='Population' type='@X' location="0"/>
      </populationSize>
    </initialState>

    <populationEndCondition spec='PopulationEndCondition'
			    threshold="0"
			    exceedCondition="false"
			    isRejection="true">
      <population spec='Population' type='@X' location='0'/>
      <population spec='Population' type='@X' location='1'/>
    </populationEndCondition>
    
    <momentGroup spec='MomentGroup' momentGroupName='X'>
      <moment spec='Moment'>
	<factor spec='Population' type="@X" location="0"/>
      </moment>
      <moment spec='Moment'>
	<factor spec='Population' type="@X" location="1"/>
      </moment>
    </momentGroup>

    <moment spec='Moment' momentName='X1X2'>
      <factor spec='Population' type="@X" location="0"/>
      <factor spec='Population' type="@X" location="1"/>
    </moment>
    
    <momentGroup spec='MomentGroup' momentGroupName='N' sum="true">
      <moment spec='Moment'>
	<factor spec='Population' type="@X" location="0"/>
      </moment>
      <moment spec='Moment'>
	<factor spec='Population' type="@X" location="1"/>
      </moment>
    </momentGroup>
    
    <output spec='JsonOutput' fileName='StructuredStochasticLogistic_out.json'/>
  </run>
</beast>

Output visualization

As in the previous tutorial, we begin by loading the (now installed) rjson library and using fromJSON to load the generated output file into R:

> library(rjson)
> df <- fromJSON(file='StructuredStochasticLogistic_out.json')

When generated from an ensemble summary type calculation, the output file has a slightly different structure compared with that produced by a single trajectory calculation. This is made clear by the output of names.

> names(df)
[1] "X1X2"  "t"  "sim"  "N"  "X"

While the df$t and df$sim elements remain, we see that the remaining elements are named according to the moment and moment group names chosen in the XML. Probing deeper we find that the internal structure of these elements also has a different form:

> names(df$X)
[1] "std"  "mean"

Here df$X$mean contains the estimated moment values, while df$X$std contains their estimated standard deviations. Each of these elements are further subdivided into the individual moment time series from within the moment group. (Moments on their own rather than within a group such as "X1X2" contain only a single time series.) The other moment elements df$X1X2 and df$N are similarly subdivided.

Using this information, we can use the following R commands to plot the estimated expected population sizes and their standard deviations:

plot(df$t, df$X$mean[[1]], 'l', col='blue', lwd=2,
     ylim=c(0,2000),
     xlab='Time',
     ylab='Deme population sizes',
     main='Population size dynamics')

lines(df$t, df$X$mean[[1]]-df$X$std[[1]], col='blue', lty=2)
lines(df$t, df$X$mean[[1]]+df$X$std[[1]], col='blue', lty=2)

lines(df$t, df$X$mean[[2]], col='purple', lwd=2)
lines(df$t, df$X$mean[[2]]-df$X$std[[2]], col='purple', lty=2)
lines(df$t, df$X$mean[[2]]+df$X$std[[2]], col='purple', lty=2)

legend('topleft', inset=.05,
       c(expression(E(N[1])), expression(E(N[2])), '+/- SD'),
       lty=c(1,1,2), lwd=c(2,2,1), col=c('blue','purple','black'))

This produces the figure shown below. The asymmetry between the quasi-steady state values is due to the corresponding asymmetry in our chosen migration rates. Note also that the end conditioning removes a significant source of noise in this calculation, meaning that the standard deviations are much smaller than they would be in the absence of the state-dependent post selection.

We now use the "X1X2" moment estimate to graph the development of the relative covariance between the stochastic fluctuations in the two demes. We define this as

so that values less than 1 indicate anti-correlation, values greater than 1 indicate positive correlation and values close to or equal to 1 indicate no correlation. The following R command produces the required visualization:

plot(df$t, df$X1X2$mean[[1]]/df$X$mean[[1]]/df$X$mean[[2]],
     xlab='Time',
     ylab=expression(Cov[rel](N[1],N[2])), 'l', lwd=2,
     main='Inter-deme covariance dynamics')

and its result is shown below. Here we can clearly see that the population sizes in the two demes are almost completely anti-correlated at the start of the simulation, become significantly positively correlated during the strongest part of the exponential growth phase, before losing much of their correlation toward the end of the simulated time period. This result makes sense when one considers that the initial state consists of a single individual isolated in a single deme, meaning that anti-correlated fluctuations due to hopping between demes dominate. Once the exponential growth really kicks in, however, the major source of noise in the population sizes is due to the timing of this growth phase. At this point, knowledge that one deme's population has entered this phase is a strong indication that the other deme's population has as well: hence the transient positive correlation at this point. Finally, as the demes approach their stochastic steady states, fluctuations are primarily due to uncorrelated birth and death events.