Sensitivity analysis for chaotic systems (shadowing methods)

Let us define the instantaneous objective $g(u,p)$ which depends on the state u and the parameter p of the differential equation. Then, if the objective is a long-time average quantity

\[\langle g \rangle_∞ = \lim_{T \rightarrow ∞} \langle g \rangle_T,\]

where

\[\langle g \rangle_T = \frac{1}{T} \int_0^T g(u,p) \text{d}t,\]

under the assumption of ergodicity, $\langle g \rangle_∞$ only depends on p.

In the case of chaotic systems, the trajectories diverge with $O(1)$ error]. This can be seen, for instance, when solving the Lorenz system at 1e-14 tolerances with 9th order integrators and a small machine-epsilon perturbation:

using OrdinaryDiffEq

function lorenz!(du, u, p, t)
  du[1] = 10 * (u[2] - u[1])
  du[2] = u[1] * (p[1] - u[3]) - u[2]
  du[3] = u[1] * u[2] - (8 // 3) * u[3]
end

p = [28.0]
tspan = (0.0, 100.0)
u0 = [1.0, 0.0, 0.0]
prob = ODEProblem(lorenz!, u0, tspan, p)
sol = solve(prob, Vern9(), abstol = 1e-14, reltol = 1e-14)
sol2 = solve(prob, Vern9(), abstol = 1e-14 + eps(Float64), reltol = 1e-14)

Chaotic behavior of the Lorenz system

More formally, such chaotic behavior can be analyzed using tools from uncertainty quantification. This effect of diverging trajectories is known as the butterfly effect and can be formulated as "most (small) perturbations on initial conditions or parameters lead to new trajectories diverging exponentially fast from the original trajectory".

The latter statement can be roughly translated to the level of sensitivity calculation as follows: "For most initial conditions, the (homogeneous) tangent solutions grow exponentially fast."

To compute derivatives of an objective $\langle g \rangle_∞$ with respect to the parameters p of a chaotic systems, one thus encounters that "traditional" forward and adjoint sensitivity methods diverge because the tangent space diverges with a rate given by the Lyapunov exponent. Taking the average of these derivative can then also fail, i.e., one finds that the average derivative is not the derivative of the average.

Although numerically computed chaotic trajectories diverge from the true/original trajectory, the shadowing theorem guarantees that there exists an errorless trajectory with a slightly different initial condition that stays near ("shadows") the numerically computed one, see, e.g, the blog post or the non-intrusive least squares shadowing paper for more details. Essentially, the idea is to replace the ill-conditioned ODE by a well-conditioned optimization problem. Shadowing methods use the shadowing theorem within a renormalization procedure to distill the long-time effect from the joint observation of the long-time and the butterfly effect. This allows us to accurately compute derivatives w.r.t. the long-time average quantities.

The following sensealg choices exist

  • ForwardLSS(;alpha=CosWindowing(),ADKwargs...): An implementation of the forward least square shadowing method. For alpha, one can choose between two different windowing options, CosWindowing (default) and Cos2Windowing, and alpha::Number which corresponds to the weight of the time dilation term in ForwardLSS.
  • AdjointLSS(;alpha=10.0,ADKwargs...): An implementation of the adjoint-mode least square shadowing method. alpha controls the weight of the time dilation term in AdjointLSS.
  • NILSS(nseg, nstep; rng = Xorshifts.Xoroshiro128Plus(rand(UInt64)), ADKwargs...): An implementation of the non-intrusive least squares shadowing (NILSS) method. nseg is the number of segments. nstep is the number of steps per segment.
  • NILSAS(nseg, nstep, M=nothing; rng = Xorshifts.Xoroshiro128Plus(rand(UInt64)), ADKwargs...): An implementation of the non-intrusive least squares adjoint shadowing (NILSAS) method. nseg is the number of segments. nstep is the number of steps per segment, M >= nus + 1 has to be provided, where nus is the number of unstable covariant Lyapunov vectors.

Recommendation: Since the computational and memory costs of NILSS() scale with the number of positive (unstable) Lyapunov, it is typically less expensive than ForwardLSS(). AdjointLSS() and NILSAS() are favorable for a large number of system parameters.

As an example, for the Lorenz system with g(u,p,t) = u[3], i.e., the $z$ coordinate, as the instantaneous objective, we can use the direct interface by passing ForwardLSS as the sensealg:

function lorenz!(du,u,p,t)
  du[1] = p[1]*(u[2]-u[1])
  du[2] = u[1]*(p[2]-u[3]) - u[2]
  du[3] = u[1]*u[2] - p[3]*u[3]
end

p = [10.0, 28.0, 8/3]

tspan_init = (0.0,30.0)
tspan_attractor = (30.0,50.0)
u0 = rand(3)
prob_init = ODEProblem(lorenz!,u0,tspan_init,p)
sol_init = solve(prob_init,Tsit5())
prob_attractor = ODEProblem(lorenz!,sol_init[end],tspan_attractor,p)

g(u,p,t) = u[end]

function G(p)
  _prob = remake(prob_attractor,p=p)
  _sol = solve(_prob,Vern9(),abstol=1e-14,reltol=1e-14,saveat=0.01,sensealg=ForwardLSS(alpha=10),g=g)
  sum(getindex.(_sol.u,3))
end
dp1 = Zygote.gradient(p->G(p),p)

Alternatively, we can define the ForwardLSSProblem and solve it via shadow_forward as follows:

lss_problem = ForwardLSSProblem(sol_attractor, ForwardLSS(alpha=10), g)
resfw = shadow_forward(lss_problem)
@test res ≈ dp1[1] atol=1e-10