Sensitivity analysis for chaotic systems (shadowing methods)
Let us define the instantaneous objective $g(u,p)$ which depends on the state u
and the parameter p
of the differential equation. Then, if the objective is a long-time average quantity
\[\langle g \rangle_∞ = \lim_{T \rightarrow ∞} \langle g \rangle_T,\]
where
\[\langle g \rangle_T = \frac{1}{T} \int_0^T g(u,p) \text{d}t,\]
under the assumption of ergodicity, $\langle g \rangle_∞$ only depends on p
.
In the case of chaotic systems, the trajectories diverge with $O(1)$ error]. This can be seen, for instance, when solving the Lorenz system at 1e-14
tolerances with 9th order integrators and a small machine-epsilon perturbation:
using OrdinaryDiffEq
function lorenz!(du, u, p, t)
du[1] = 10 * (u[2] - u[1])
du[2] = u[1] * (p[1] - u[3]) - u[2]
du[3] = u[1] * u[2] - (8 // 3) * u[3]
end
p = [28.0]
tspan = (0.0, 100.0)
u0 = [1.0, 0.0, 0.0]
prob = ODEProblem(lorenz!, u0, tspan, p)
sol = solve(prob, Vern9(), abstol = 1e-14, reltol = 1e-14)
sol2 = solve(prob, Vern9(), abstol = 1e-14 + eps(Float64), reltol = 1e-14)
More formally, such chaotic behavior can be analyzed using tools from uncertainty quantification. This effect of diverging trajectories is known as the butterfly effect and can be formulated as "most (small) perturbations on initial conditions or parameters lead to new trajectories diverging exponentially fast from the original trajectory".
The latter statement can be roughly translated to the level of sensitivity calculation as follows: "For most initial conditions, the (homogeneous) tangent solutions grow exponentially fast."
To compute derivatives of an objective $\langle g \rangle_∞$ with respect to the parameters p
of a chaotic systems, one thus encounters that "traditional" forward and adjoint sensitivity methods diverge because the tangent space diverges with a rate given by the Lyapunov exponent. Taking the average of these derivative can then also fail, i.e., one finds that the average derivative is not the derivative of the average.
Although numerically computed chaotic trajectories diverge from the true/original trajectory, the shadowing theorem guarantees that there exists an errorless trajectory with a slightly different initial condition that stays near ("shadows") the numerically computed one, see, e.g, the blog post or the non-intrusive least squares shadowing paper for more details. Essentially, the idea is to replace the ill-conditioned ODE by a well-conditioned optimization problem. Shadowing methods use the shadowing theorem within a renormalization procedure to distill the long-time effect from the joint observation of the long-time and the butterfly effect. This allows us to accurately compute derivatives w.r.t. the long-time average quantities.
The following sensealg
choices exist
ForwardLSS(;alpha=CosWindowing(),ADKwargs...)
: An implementation of the forward least square shadowing method. Foralpha
, one can choose between two different windowing options,CosWindowing
(default) andCos2Windowing
, andalpha::Number
which corresponds to the weight of the time dilation term inForwardLSS
.AdjointLSS(;alpha=10.0,ADKwargs...)
: An implementation of the adjoint-mode least square shadowing method.alpha
controls the weight of the time dilation term inAdjointLSS
.NILSS(nseg, nstep; rng = Xorshifts.Xoroshiro128Plus(rand(UInt64)), ADKwargs...)
: An implementation of the non-intrusive least squares shadowing (NILSS) method.nseg
is the number of segments.nstep
is the number of steps per segment.NILSAS(nseg, nstep, M=nothing; rng = Xorshifts.Xoroshiro128Plus(rand(UInt64)), ADKwargs...)
: An implementation of the non-intrusive least squares adjoint shadowing (NILSAS) method.nseg
is the number of segments.nstep
is the number of steps per segment,M >= nus + 1
has to be provided, wherenus
is the number of unstable covariant Lyapunov vectors.
Recommendation: Since the computational and memory costs of NILSS()
scale with the number of positive (unstable) Lyapunov, it is typically less expensive than ForwardLSS()
. AdjointLSS()
and NILSAS()
are favorable for a large number of system parameters.
As an example, for the Lorenz system with g(u,p,t) = u[3]
, i.e., the $z$ coordinate, as the instantaneous objective, we can use the direct interface by passing ForwardLSS
as the sensealg
:
function lorenz!(du,u,p,t)
du[1] = p[1]*(u[2]-u[1])
du[2] = u[1]*(p[2]-u[3]) - u[2]
du[3] = u[1]*u[2] - p[3]*u[3]
end
p = [10.0, 28.0, 8/3]
tspan_init = (0.0,30.0)
tspan_attractor = (30.0,50.0)
u0 = rand(3)
prob_init = ODEProblem(lorenz!,u0,tspan_init,p)
sol_init = solve(prob_init,Tsit5())
prob_attractor = ODEProblem(lorenz!,sol_init[end],tspan_attractor,p)
g(u,p,t) = u[end]
function G(p)
_prob = remake(prob_attractor,p=p)
_sol = solve(_prob,Vern9(),abstol=1e-14,reltol=1e-14,saveat=0.01,sensealg=ForwardLSS(alpha=10),g=g)
sum(getindex.(_sol.u,3))
end
dp1 = Zygote.gradient(p->G(p),p)
Alternatively, we can define the ForwardLSSProblem
and solve it via shadow_forward
as follows:
lss_problem = ForwardLSSProblem(sol_attractor, ForwardLSS(alpha=10), g)
resfw = shadow_forward(lss_problem)
@test res ≈ dp1[1] atol=1e-10