Introduction to Catalyst and Julia for New Julia users

The Catalyst tool for the modelling of chemical reaction networks is based in the Julia programming language. While experience in Julia programming is advantageous for using Catalyst, it is not necessary for accessing some of its basic features. This tutorial serves as an introduction to Catalyst for those unfamiliar with Julia, also introducing some basic Julia concepts. Anyone who plans on using Catalyst extensively is recommended to familiarise oneself more thoroughly with the Julia programming language. A collection of resources for learning Julia can be found here, and a full documentation is available here.

Julia can be downloaded here.

Users who are already familiar with Julia can skip to the Introduction to Catalyst tutorial.

Basic Julia usage

On the surface, Julia has many similarities to languages like MATLAB, Python, and R.

Values can be assigned to variables through the use of a = sign. Values (possibly stored in variables) can be used for most basic computations.

length = 2.0
width = 4.0
area = length*width
8.0

Functions take one or more inputs (enclosed by ()) and return some output. E.g. the min function returns the minimum of two values

min(1.0, 3.0)
1.0

A line of Julia code is not required to end with ;, however, if it does, the output of that line is not displayed.

min(1.0, 3.0);

Each Julia variable has a specific type, designating what type of value it is. While not directly required to use Catalyst, this is useful to be aware of. To learn the type of a specific variable, use the typeof function. More information about types can be found here.

typeof(1.0)
Float64

Here, Float64 denotes decimal-valued numbers. Integer-valued numbers instead are of the Int64 type.

typeof(1)
Int64

Finally, we note that the first time some code is run in Julia, it has to be compiled. However, this is only required once per Julia session. Hence, the second time the same code is run, it runs much faster. E.g. try running this line of code first one time, and then one additional time. You will note that the second run is much faster.

rand(100, 100)^3.5;
100×100 Matrix{ComplexF64}:
 8421.21-0.00537535im  7073.34+0.0137052im    …  7784.07+0.0365634im
 9244.55-0.00145854im  7765.04+0.00371819im      8550.42+0.00991687im
 8700.99-0.00314425im  7309.28+0.00801346im      8050.43+0.0213625im
 9024.09+0.00492705im  7576.51-0.0125549im       8339.45-0.0334581im
 8968.59-0.00105853im  7534.03+0.00269874im      8299.17+0.00719931im
 8709.33+0.00131373im  7314.98-0.00334946im   …  8048.94-0.00893554im
 8944.35-0.00855466im  7523.63+0.0218041im       8277.94+0.0581341im
 8660.65+0.0012359im   7284.76-0.00314839im      8012.04-0.00838569im
 8311.15+0.00718863im  6997.81-0.0183253im       7691.81-0.0488742im
 9134.31+0.00222324im  7670.76-0.00566934im      8442.11-0.0151296im
        ⋮                                     ⋱  
 9057.59-6.20057e-5im  7621.23+0.000155366im     8377.66+0.000400678im
 8807.85+0.00358462im  7398.78-0.0091363im       8140.62-0.0243582im
 9684.34+0.00186518im  8139.03-0.00475464im      8959.45-0.0126802im
 9070.44+0.00367478im  7617.82-0.00936383im      8383.85-0.0249534im
 8654.18-0.00254924im  7273.38+0.00649814im   …  8011.01+0.0173286im
 9219.43-5.89495e-5im  7742.64+0.000149425im     8520.61+0.000394214im
 9312.43+0.00746522im  7821.27-0.0190278im       8604.31-0.0507347im
 8833.06-0.00138809im  7431.75+0.00353678im       8168.2+0.00942383im
  9178.5-0.00501376im  7711.46+0.0127758im       8485.11+0.0340467im

(This code creates a random 100x100 matrix, and takes it to the power of 3.5)

This is useful to know when you e.g. declare, simulate, or plot, a Catalyst model. The first time you run a command there might be a slight delay. However, subsequent runs will execute much quicker. This holds even if you do minor adjustments before the second run (such as changing simulation initial conditions).

Installing and activating packages

Except for some base Julia packages (such as Pkg, the package manager) that are available by default, Julia packages must be installed locally before they can be used. Most packages are registered with Julia, and can be added through the Pkg.add("DesiredPackage") command (where DesiredPackage is the name of the package you wish to install). We can thus install Catalyst:

using Pkg
Pkg.add("Catalyst")

Here, the command using Pkg is required to activate the Pkg package manager.

Next, we also wish to add the DifferentialEquations and Plots packages (for numeric simulation of models, and plotting, respectively).

Pkg.add("DifferentialEquations")
Pkg.add("Plots")

Once a package has been installed through the Pkg.add command, this command does not have to be repeated in further Julia sessions on the same machine.

Installing a Julia package is, however, not enough to use it. Before a package's features are used in a Julia session, it has to be loaded through the using DesiredPackage command (where DesiredPackage is the name of the package you wish to activate). This command has to be repeated whenever a Julia session is restarted.

We thus activate our three desired packages:

using Catalyst
using DifferentialEquations
using Plots

For a more detailed introduction to Julia packages, please read the Pkg documentation.

Simulating a basic Catalyst model

Now that we have some basic familiarity with Julia, and have installed and activated the required packages, we will create and simulate a basic chemical reaction network model through Catalyst.

Catalyst models are created through the @reaction_network macro. For more information on macros, please read the Julia documentation on macros. This documentation is, however, rather advanced (and not required to use Catalyst). We instead recommend that you simply familiarise yourself with the Catalyst syntax, without studying in detail how macros work and what they are.

The @reaction_network command is followed by the begin keyword, which is followed by one line for each reaction of the model. Each reaction consists of a reaction rate, followed by the reaction itself. The reaction itself contains a set of substrates and a set of products (what is consumed and produced by the reaction, respectively). These are separated by a --> arrow. Finally, the model ends with the end keyword.

Here, we create a simple birth-death model, where a single species (X) is created at rate b, and degraded at rate d. The model is stored in the variable rn.

rn = @reaction_network begin
    b, 0 --> X
    d, X --> 0
end

\[ \begin{align*} \varnothing &\xrightleftharpoons[d]{b} \mathrm{X} \end{align*} \]

For more information on how to use the Catalyst model creator (also known as the Catalyst DSL), please read the corresponding documentation.

Next, we wish to simulate our model. To do this, we need to provide some additional information to the simulator. This is

  • The initial condition. That is, the concentration or numbers of each species at the start of the simulation.
  • The timespan. That is, the timeframe over which we wish to run the simulation.
  • The parameter values. That is, the values of the model's parameters for this simulation.

The initial condition is given as a Vector. This is a type which collects several different values. To declare a vector, the values are specific within brackets, [], and separated by ,. Since we only have one species, the vector holds a single element. In this element, we set the value of X using the :X => 1.0 syntax. Here, we first denote the name of the species (with a : pre-appended), next follows a => and then the value of X. Since we wish to simulate the concentration of X over time, we will let the initial condition be decimal valued.

u0 = [:X => 1.0]
1-element Vector{Pair{Symbol, Float64}}:
 :X => 1.0

The timespan sets the time point at which we start the simulation (typically 0.0 is used) and the final time point of the simulation. These are combined into a two-valued Tuple. Tuples are similar to vectors, but are enclosed by () and not []. Again, we will let both time points be decimal valued.

tspan = (0.0, 10.0)
(0.0, 10.0)

Finally, the parameter values are, like the initial conditions, given as a vector. Since we have two parameters (b and d), the parameter vector has two values. We use a similar notation for setting the parameter values as the initial condition (first the parameter, then an arrow, then the value).

params = [:b => 1.0, :d => 0.2]
2-element Vector{Pair{Symbol, Float64}}:
 :b => 1.0
 :d => 0.2

Please read here for more information on Vectors and Tuples.

Next, before we can simulate our model, we bundle all the required information together in a so-called ODEProblem. Note that the order in which the input (the model, the initial condition, the timespan, and the parameter values) is provided to the ODEProblem matters. E.g. the parameter values cannot be provided as the first argument, but have to be the fourth argument. Here, we save our ODEProblem in the oprob variable.

oprob = ODEProblem(rn, u0, tspan, params)
ODEProblem with uType Vector{Float64} and tType Float64. In-place: true
timespan: (0.0, 10.0)
u0: 1-element Vector{Float64}:
 1.0

We can now simulate our model. We do this by providing the ODEProblem to the solve function. We save the output to the sol variable.

sol = solve(oprob)

Finally, we can plot the solution through the plot function.

plot(sol)

Here, the plot shows the time evolution of the concentration of the species X from its initial condition.

For more information about the numerical simulation package, please see the DifferentialEquations documentation. For more information about the plotting package, please see the Plots documentation.

Additional modelling example

To make this introduction more comprehensive, we here provide another example, using a more complicated model. In addition, instead of simulating our model as concentrations evolve over time, we will simulate the individual reaction events through the Gillespie algorithm. This is a way to add noise to our model.

Remember, unless we have restarted Julia, we do not need to activate our packages (through the using command) again.

This time, we will declare the so-called SIR model for an infectious disease. Note that even if this model does not describe a set of chemical reactions, it can be modelled using the same dynamics. The model consists of 3 species:

  • S, the amount of susceptible individuals.
  • I, the amount of infected individuals.
  • R, the amount of recovered (or removed) individuals.

It also has 2 reaction events:

  • Infection, where a susceptible individual meets an infected individual and also becomes infected.
  • Recovery, where an infected individual recovers.

Each reaction is also associated with a specific rate (corresponding to a parameter).

  • b, the infection rate.
  • k, the recovery rate.

We declare the model using the @reaction_network macro, and store it in the sir_model variable.

sir_model = @reaction_network begin
    b, S + I --> 2I
    k, I --> R
end

\[ \begin{align*} \mathrm{S} + \mathrm{I} &\xrightarrow{b} 2 \mathrm{I} \\ \mathrm{I} &\xrightarrow{k} \mathrm{R} \end{align*} \]

Note that the first reaction contains two different substrates (separated by a + sign). While there is only a single product (I), two copies of I are produced. The 2 in front of the product I denotes this.

Next, we declare our initial condition, time span, and parameter values. Since we want to simulate the individual reaction events, that discretely change the state of our model, we want our initial conditions to be integer-valued. We will start with a mostly susceptible population, but where a single individual has been infected through some means.

u0 = [:S => 50, :I => 1, :R => 0.0]
tspan = (0.0, 10.0)
params = [:b => 0.2, :k => 1.0]
2-element Vector{Pair{Symbol, Float64}}:
 :b => 0.2
 :k => 1.0

Previously we have bundled this information into an ODEProblem (denoting a deterministic ordinary differential equation). Now we wish to simulate our model as a jump process (where each reaction event denotes a single jump in the state of the system). We do this by first creating a DiscreteProblem, and then using this as an input to a JumpProblem.

dprob = DiscreteProblem(sir_model, u0, tspan, params)
jprob = JumpProblem(sir_model, dprob, Direct())
JumpProblem with problem DiscreteProblem with aggregator JumpProcesses.Direct
Number of jumps with discrete aggregation: 0
Number of jumps with continuous aggregation: 0
Number of mass action jumps: 2

Again, the order in which the inputs are given to the DiscreteProblem and the JumpProblem is important. The last argument to the JumpProblem (Direct()) denotes which simulation method we wish to use. For now, we recommend the user simply use the Direct() option, and then consider alternative ones (see the JumpProcesses.jl docs) when they are more familiar with modelling in Catalyst and Julia.

Finally, we can simulate our model using the solve function, and plot the solution using the plot function. Here, the solve function also has a second argument (SSAStepper()). This is a time stepping algorithm that calls the Direct solver to advance a simulation. Again, we recommend at this stage you simply use this option, and then explore exactly what this means at a later stage.

sol = solve(jprob, SSAStepper())
plot(sol)

Exercise: Try simulating the model several times. Note that the epidemic doesn't always take off, but sometimes dies out without spreading through the population. Try changing the infection rate (b), determining how this value affects the probability that the epidemic goes through the population.


Feedback

If you are a new Julia user who has used this tutorial, and there was something you struggled with or would have liked to have explained better, please raise an issue. That way, we can continue improving this tutorial.


References