Enforcing Physical Constraints via Universal Differential-Algebraic Equations

As shown in the stiff ODE tutorial, differential-algebraic equations (DAEs) can be used to impose physical constraints. One way to define a DAE is through an ODE with a singular mass matrix. For example, if we make Mu' = f(u) where the last row of M is all zeros, then we have a constraint defined by the right hand side. Using NeuralODEMM, we can use this to define a neural ODE where the sum of all 3 terms must add to one. An example of this is as follows:

using Lux, DiffEqFlux, Optimization, OptimizationOptimJL, DifferentialEquations, Plots

using Random
rng = Random.default_rng()

function f!(du, u, p, t)
    y₁, y₂, y₃ = u
    k₁, k₂, k₃ = p
    du[1] = -k₁*y₁ + k₃*y₂*y₃
    du[2] =  k₁*y₁ - k₃*y₂*y₃ - k₂*y₂^2
    du[3] =  y₁ + y₂ + y₃ - 1
    return nothing
end

u₀ = [1.0, 0, 0]
M = [1. 0  0
     0  1. 0
     0  0  0]

tspan = (0.0,1.0)
p = [0.04, 3e7, 1e4]

stiff_func = ODEFunction(f!, mass_matrix = M)
prob_stiff = ODEProblem(stiff_func, u₀, tspan, p)
sol_stiff = solve(prob_stiff, Rodas5(), saveat = 0.1)

nn_dudt2 = Lux.Chain(Lux.Dense(3, 64, tanh),
                 Lux.Dense(64, 2))

pinit, st = Lux.setup(rng, nn_dudt2)

model_stiff_ndae = NeuralODEMM(nn_dudt2, (u, p, t) -> [u[1] + u[2] + u[3] - 1],
                               tspan, M, Rodas5(autodiff=false), saveat = 0.1)
model_stiff_ndae(u₀, Lux.ComponentArray(pinit), st)

function predict_stiff_ndae(p)
    return model_stiff_ndae(u₀, p, st)[1]
end

function loss_stiff_ndae(p)
    pred = predict_stiff_ndae(p)
    loss = sum(abs2, Array(sol_stiff) .- pred)
    return loss, pred
end

# callback = function (p, l, pred) #callback function to observe training
#   display(l)
#   return false
# end

l1 = first(loss_stiff_ndae(Lux.ComponentArray(pinit)))

adtype = Optimization.AutoZygote()
optf = Optimization.OptimizationFunction((x,p) -> loss_stiff_ndae(x), adtype)
optprob = Optimization.OptimizationProblem(optf, Lux.ComponentArray(pinit))
result_stiff = Optimization.solve(optprob, BFGS(), maxiters=100)

Step-by-Step Description

Load Packages

using Lux, DiffEqFlux, Optimization, OptimizationOptimJL, DifferentialEquations, Plots

using Random
rng = Random.default_rng()

Differential Equation

First, we define our differential equations as a highly stiff problem which makes the fitting difficult.

function f!(du, u, p, t)
    y₁, y₂, y₃ = u
    k₁, k₂, k₃ = p
    du[1] = -k₁*y₁ + k₃*y₂*y₃
    du[2] =  k₁*y₁ - k₃*y₂*y₃ - k₂*y₂^2
    du[3] =  y₁ + y₂ + y₃ - 1
    return nothing
end

Parameters

u₀ = [1.0, 0, 0]

M = [1. 0  0
     0  1. 0
     0  0  0]

tspan = (0.0,1.0)

p = [0.04, 3e7, 1e4]

u₀ = Initial Conditions
M = Semi-explicit Mass Matrix (last row is the constraint equation and are therefore

all zeros)

tspan = Time span over which to evaluate
p = parameters k1, k2 and k3 of the differential equation above

ODE Function, Problem and Solution

We define and solve our ODE problem to generate the "labeled" data which will be used to train our Neural Network.

stiff_func = ODEFunction(f!, mass_matrix = M)
prob_stiff = ODEProblem(stiff_func, u₀, tspan, p)
sol_stiff = solve(prob_stiff, Rodas5(), saveat = 0.1)

Because this is a DAE we need to make sure to use a compatible solver. Rodas5 works well for this example.

Neural Network Layers

Next, we create our layers using Lux.Chain. We use this instead of Flux.Chain because it is more suited to SciML applications (similarly for Lux.Dense). The input to our network will be the initial conditions fed in as u₀.

nn_dudt2 = Lux.Chain(Lux.Dense(3, 64, tanh),
                 Lux.Dense(64, 2))

pinit, st = Lux.setup(rng, nn_dudt2)

model_stiff_ndae = NeuralODEMM(nn_dudt2, (u, p, t) -> [u[1] + u[2] + u[3] - 1],
                               tspan, M, Rodas5(autodiff=false), saveat = 0.1)
model_stiff_ndae(u₀, Lux.ComponentArray(pinit), st)

Because this is a stiff problem, we have manually imposed that sum constraint via (u,p,t) -> [u[1] + u[2] + u[3] - 1], making the fitting easier.

Prediction Function

For simplicity, we define a wrapper function that only takes in the model's parameters to make predictions.

function predict_stiff_ndae(p)
    return model_stiff_ndae(u₀, p, st)[1]
end

Train Parameters

Training our network requires a loss function, an optimizer and a callback function to display the progress.

Loss

We first make our predictions based on the current parameters, then calculate the loss from these predictions. In this case, we use least squares as our loss.

function loss_stiff_ndae(p)
    pred = predict_stiff_ndae(p)
    loss = sum(abs2, sol_stiff .- pred)
    return loss, pred
end

l1 = first(loss_stiff_ndae(Lux.ComponentArray(pinit)))

Notice that we are feeding the parameters of model_stiff_ndae to the loss_stiff_ndae function. model_stiff_node.p are the weights of our NN and is of size 386 (4 * 64 + 65 * 2) including the biases.

Optimizer

The optimizer is BFGS(see below).

Callback

The callback function displays the loss during training.

callback = function (p, l, pred) #callback function to observe training
  display(l)
  return false
end

Train

Finally, training with Optimization.solve by passing: loss function, model parameters, optimizer, callback and maximum iteration.

adtype = Optimization.AutoZygote()
optf = Optimization.OptimizationFunction((x,p) -> loss_stiff_ndae(x), adtype)
optprob = Optimization.OptimizationProblem(optf, Lux.ComponentArray(pinit))
result_stiff = Optimization.solve(optprob, BFGS(), maxiters=100)