# Parameter Estimation

The parameters of a model, generated by Catalyst, can be estimated using various packages available in the Julia ecosystem. Refer here for more extensive information. Below follows a quick tutorial of how Optimization.jl can be used to fit a parameter set to data.

First, we fetch the required packages.

using Catalyst
using DifferentialEquations
using SciMLSensitivity
using Optimization

# for the ADAM optimizer
using OptimizationOptimisers

Next, we declare our model. For our example, we will use the Brusselator, a simple oscillator.

brusselator = @reaction_network begin
A, ∅ → X
1, 2X + Y → 3X
B, X → Y
1, X → ∅
end
p_real = [:A => 1., :B => 2.]
2-element Vector{Pair{Symbol, Float64}}:
:A => 1.0
:B => 2.0

We simulate our model, and from the simulation generate sampled data points (with added noise), to which we will attempt to fit a parameter et.

u0 = [:X => 1.0, :Y => 1.0]
tspan = (0.0, 30.0)

sample_times = range(tspan[1]; stop = tspan[2], length = 100)
prob = ODEProblem(brusselator, u0, tspan, p_real)
sol_real = solve(prob, Rosenbrock23(); tstops = sample_times)
sample_vals = Array(sol_real(sample_times))
sample_vals .*= (1 .+ .1 * rand(Float64, size(sample_vals)) .- .05)
nothing   # hide

We can plot the real solution, as well as the noisy samples.

using Plots
default(; lw = 3, framestyle = :box, size = (800, 400))

plot(sol_real; legend = nothing, color = [:darkblue :darkred])
scatter!(sample_times, sample_vals'; color = [:blue :red], legend = nothing)

Next, we create a function to fit the parameters using the ADAM optimizer. For a given initial estimate of the parameter values, pinit, this function will fit parameter values, p, to our data samples. tend is used to indicate the time interval over which to fit to the ODE solution.

function optimise_p(pinit, tend)
function loss(p, _)
newtimes = filter(<=(tend), sample_times)
newprob = remake(prob; tspan = (0.0, tend), p = p)
sol = Array(solve(newprob, Rosenbrock23(); saveat = newtimes))
loss = sum(abs2, sol .- sample_vals[:, 1:size(sol,2)])
return loss, sol
end

# optimize for the parameters that minimize the loss
optf = OptimizationFunction(loss, Optimization.AutoZygote())
optprob = OptimizationProblem(optf, pinit)
sol = solve(optprob, ADAM(0.1); maxiters = 100)

# return the parameters we found
return sol.u
end

Next, we will fit a parameter set to the data on the interval (0, 10).

p_estimate = optimise_p([5.0, 5.0], 10.0)

We can compare this to the real solution, as well as the sample data

newprob = remake(prob; tspan = (0., 10.), p = p_estimate)
sol_estimate = solve(newprob, Rosenbrock23())
plot(sol_real; color = [:blue :red], label = ["X real" "Y real"], linealpha = 0.2)
scatter!(sample_times, sample_vals'; color = [:blue :red],
label = ["Samples of X" "Samples of Y"], alpha = 0.4)
plot!(sol_estimate; color = [:darkblue :darkred], linestyle = :dash,
label = ["X estimated" "Y estimated"], xlimit = tspan)

Next, we use this parameter estimate as the input to the next iteration of our fitting process, this time on the interval (0, 20).

p_estimate = optimise_p(p_estimate, 20.)
newprob = remake(prob; tspan = (0., 20.), p = p_estimate)
sol_estimate = solve(newprob, Rosenbrock23())
plot(sol_real; color = [:blue :red], label = ["X real" "Y real"], linealpha = 0.2)
scatter!(sample_times, sample_vals'; color = [:blue :red],
label = ["Samples of X" "Samples of Y"], alpha = 0.4)
plot!(sol_estimate; color = [:darkblue :darkred], linestyle = :dash,
label = ["X estimated" "Y estimated"], xlimit = tspan)

Finally, we use this estimate as the input to fit a parameter set on the full time interval of the sampled data.

p_estimate = optimise_p(p_estimate, 30.0)

newprob = remake(prob; tspan = (0., 30.0), p = p_estimate)
sol_estimate = solve(newprob, Rosenbrock23())
plot(sol_real; color = [:blue :red], label = ["X real" "Y real"], linealpha = 0.2)
scatter!(sample_times, sample_vals'; color = [:blue :red],
label = ["Samples of X" "Samples of Y"], alpha = 0.4)
plot!(sol_estimate; color = [:darkblue :darkred], linestyle = :dash,
label = ["X estimated" "Y estimated"], xlimit = tspan)

The final parameter estimate is then

p_estimate

which is close to the actual parameters of [1.0, 2.0].

## Why we fit the parameters in iterations.

The reason we chose to fit the model on a smaller interval to begin with, and then extend the interval, is to avoid getting stuck in a local minimum. Here specifically, we chose our initial interval to be smaller than a full cycle of the oscillation. If we had chosen to fit a parameter set on the full interval immediately we would have received an inferior solution.

p_estimate = optimise_p([5.0,5.0], 30.0)

newprob = remake(prob; tspan = (0.0,30.0), p = p_estimate)
sol_estimate = solve(newprob, Rosenbrock23())
plot(sol_real; color = [:blue :red], label = ["X real" "Y real"], linealpha = 0.2)
scatter!(sample_times,sample_vals'; color = [:blue :red],
label = ["Samples of X" "Samples of Y"], alpha = 0.4)
plot!(sol_estimate; color = [:darkblue :darkred], linestyle = :dash,
label = ["X estimated" "Y estimated"], xlimit = tspan)