Bayesian Neural ODEs: SGLD
Recently, Neural Ordinary Differential Equations has emerged as a powerful framework for modeling physical simulations without explicitly defining the ODEs governing the system, but learning them via machine learning. However, the question: Can Bayesian learning frameworks be integrated with Neural ODEs to robustly quantify the uncertainty in the weights of a Neural ODE? remains unanswered.
In this tutorial, a working example of the Bayesian Neural ODE: SGLD sampler is shown. SGLD stands for Stochastic Langevin Gradient Descent.
For an introduction to SGLD, please refer to Introduction to SGLD in Julia
For more details regarding Bayesian Neural ODEs, please refer to Bayesian Neural Ordinary Differential Equations.
Copy-Pasteable Code
Before getting to the explanation, here's some code to start with. We will follow with a full explanation of the definition and training process:
using DiffEqFlux, DifferentialEquations, Flux
using Plots, StatsPlots
u0 = Float32[1., 1.]
p = [1.5, 1., 3., 1.]
datasize = 45
tspan = (0.0f0, 14.f0)
tsteps = tspan[1]:0.1:tspan[2]
function lv(u, p, t)
x, y = u
α, β, γ, δ = p
dx = α*x - β*x*y
dy = δ*x*y - γ*y
du = [dx, dy]
end
trueodeprob = ODEProblem(lv, u0, tspan, p)
ode_data = Array(solve(trueodeprob, Tsit5(), saveat = tsteps))
y_train = ode_data[:, 1:35]
dudt = FastChain(FastDense(2, 50, tanh), FastDense(50, 2))
prob_node = NeuralODE(dudt, (0., 14.), Tsit5(), saveat = tsteps)
train_prob = NeuralODE(dudt, (0., 3.5), Tsit5(), saveat = tsteps[1:35])
function predict(p)
Array(train_prob(u0, p))
end
function loss(p)
sum(abs2, y_train .- predict(p))
end
sgld(∇L, θᵢ, t, a = 2.5e-3, b = 0.05, γ = 0.35) = begin
ϵ = a*(b + t)^-γ
η = ϵ.*randn(size(θᵢ))
Δθᵢ = .5ϵ*∇L + η
θᵢ .-= Δθᵢ
end
parameters = []
losses = Float64[]
grad_norm = Float64[]
θ = deepcopy(prob_node.p)
@time for t in 1:45000
grad = gradient(loss, θ)[1]
sgld(grad, θ, t)
tmp = deepcopy(θ)
append!(losses, loss(θ))
append!(grad_norm, sum(abs2, grad))
append!(parameters, [tmp])
println(loss(θ))
end
plot(losses, yscale = :log10)
plot(grad_norm, yscale =:log10)
using StatsPlots
sampled_par = parameters[43000: 45000]
##################### PLOTS: LOSSES ###############
sampled_loss = [loss(p) for p in sampled_par]
density(sampled_loss)
#################### RETRODICTED PLOTS - TIME SERIES AND CONTOUR PLOTS ####################
_, i_min = findmin(sampled_loss)
plt = scatter(tsteps,ode_data[1,:], colour = :blue, label = "Data: u1", ylim = (-.5, 10.))
scatter!(plt, tsteps, ode_data[2,:], colour = :red, label = "Data: u2")
phase_plt = scatter(ode_data[1,:], ode_data[2,:], colour = :red, label = "Data", xlim = (-.25, 7.), ylim = (-2., 6.5))
for p in sampled_par
s = prob_node(u0, p)
plot!(plt, tsteps[1:35], s[1,1:35], colour = :blue, lalpha = 0.04, label =:none)
plot!(plt, tsteps[35:end], s[1, 35:end], colour =:purple, lalpha = 0.04, label =:none)
plot!(plt, tsteps[1:35], s[2,1:35], colour = :red, lalpha = 0.04, label=:none)
plot!(plt, tsteps[35:end], s[2,35:end], colour = :purple, lalpha = 0.04, label=:none)
plot!(phase_plt, s[1,1:35], s[2,1:35], colour =:red, lalpha = 0.04, label=:none)
plot!(phase_plt, s[1,35:end], s[2, 35:end], colour = :purple, lalpha = 0.04, label=:none)
end
plt
phase_plt
plot!(plt, [3.5], seriestype =:vline, colour = :green, linestyle =:dash,label = "Training Data End")
bestfit = prob_node(u0, sampled_par[i_min])
plot(bestfit)
plot!(plt, tsteps[1:35], bestfit[2, 1:35], colour =:black, label = "Training: Best fit prediction")
plot!(plt, tsteps[35:end], bestfit[2, 35:end], colour =:purple, label = "Forecasting: Best fit prediction")
plot!(plt, tsteps[1:35], bestfit[1, 1:35], colour =:black, label = :none)
plot!(plt, tsteps[35:end], bestfit[1, 35:end], colour =:purple, label = :none)
plot!(phase_plt,bestfit[1,1:40], bestfit[2, 1:40], colour = :black, label = "Training: Best fit prediction")
plot!(phase_plt,bestfit[1, 40:end], bestfit[2, 40:end], colour = :purple, label = "Forecasting: Best fit prediction")
savefig(plt, "C:/Users/16174/Desktop/Julia Lab/MSML2021/BayesianNODE_SGLD_Plot1.png")
savefig(phase_plt, "C:/Users/16174/Desktop/Julia Lab/MSML2021/BayesianNODE_SGLD_Plot2.png")
Time Series Plots:
Contour Plots:
Explanation
Step1: Get the data from the Lotka Volterra ODE example
u0 = Float32[1., 1.]
p = [1.5, 1., 3., 1.]
datasize = 45
tspan = (0.0f0, 14.f0)
tsteps = tspan[1]:0.1:tspan[2]
function lv(u, p, t)
x, y = u
α, β, γ, δ = p
dx = α*x - β*x*y
dy = δ*x*y - γ*y
du = [dx, dy]
end
trueodeprob = ODEProblem(lv, u0, tspan, p)
ode_data = Array(solve(trueodeprob, Tsit5(), saveat = tsteps))
y_train = ode_data[:, 1:35]
Step2: Define the Neural ODE architecture. Note that this step potentially offers a lot of flexibility in the number of layers/ number of units in each layer.
dudt = FastChain(FastDense(2, 50, tanh), FastDense(50, 2))
prob_node = NeuralODE(dudt, (0., 14.), Tsit5(), saveat = tsteps)
train_prob = NeuralODE(dudt, (0., 3.5), Tsit5(), saveat = tsteps[1:35])
Step3: Define the loss function for the Neural ODE.
function predict(p)
Array(train_prob(u0, p))
end
function loss(p)
sum(abs2, y_train .- predict(p))
end
Step4: Now we start integrating the Stochastic Langevin Gradient Descent(SGLD) framework.
The SGLD (Stochastic Langevin Gradient Descent) sampler is seen to have a better performance than NUTS whose tutorial is also shown in a separate document. Have a look at https://sebastiancallh.github.io/post/langevin/ for a quick introduction to SGLD.
Note that we sample from the last 2000 iterations.
sgld(∇L, θᵢ, t, a = 2.5e-3, b = 0.05, γ = 0.35) = begin
ϵ = a*(b + t)^-γ
η = ϵ.*randn(size(θᵢ))
Δθᵢ = .5ϵ*∇L + η
θᵢ .-= Δθᵢ
end
parameters = []
losses = Float64[]
grad_norm = Float64[]
θ = deepcopy(prob_node.p)
@time for t in 1:45000
grad = gradient(loss, θ)[1]
sgld(grad, θ, t)
tmp = deepcopy(θ)
append!(losses, loss(θ))
append!(grad_norm, sum(abs2, grad))
append!(parameters, [tmp])
println(loss(θ))
end
plot(losses, yscale = :log10)
plot(grad_norm, yscale =:log10)
using StatsPlots
sampled_par = parameters[43000: 45000]
Step5: Plot Retrodicted Plots (Estimation and Forecasting).
################### RETRODICTED PLOTS - TIME SERIES AND CONTOUR PLOTS ####################
_, i_min = findmin(sampled_loss)
plt = scatter(tsteps,ode_data[1,:], colour = :blue, label = "Data: u1", ylim = (-.5, 10.))
scatter!(plt, tsteps, ode_data[2,:], colour = :red, label = "Data: u2")
phase_plt = scatter(ode_data[1,:], ode_data[2,:], colour = :red, label = "Data", xlim = (-.25, 7.), ylim = (-2., 6.5))
for p in sampled_par
s = prob_node(u0, p)
plot!(plt, tsteps[1:35], s[1,1:35], colour = :blue, lalpha = 0.04, label =:none)
plot!(plt, tsteps[35:end], s[1, 35:end], colour =:purple, lalpha = 0.04, label =:none)
plot!(plt, tsteps[1:35], s[2,1:35], colour = :red, lalpha = 0.04, label=:none)
plot!(plt, tsteps[35:end], s[2,35:end], colour = :purple, lalpha = 0.04, label=:none)
plot!(phase_plt, s[1,1:35], s[2,1:35], colour =:red, lalpha = 0.04, label=:none)
plot!(phase_plt, s[1,35:end], s[2, 35:end], colour = :purple, lalpha = 0.04, label=:none)
end
plt
phase_plt
plot!(plt, [3.5], seriestype =:vline, colour = :green, linestyle =:dash,label = "Training Data End")
bestfit = prob_node(u0, sampled_par[i_min])
plot(bestfit)
plot!(plt, tsteps[1:35], bestfit[2, 1:35], colour =:black, label = "Training: Best fit prediction")
plot!(plt, tsteps[35:end], bestfit[2, 35:end], colour =:purple, label = "Forecasting: Best fit prediction")
plot!(plt, tsteps[1:35], bestfit[1, 1:35], colour =:black, label = :none)
plot!(plt, tsteps[35:end], bestfit[1, 35:end], colour =:purple, label = :none)
plot!(phase_plt,bestfit[1,1:40], bestfit[2, 1:40], colour = :black, label = "Training: Best fit prediction")
plot!(phase_plt,bestfit[1, 40:end], bestfit[2, 40:end], colour = :purple, label = "Forecasting: Best fit prediction")