Sensitivity Algorithms for Differential Equations with Automatic Differentiation (AD)
DiffEqSensitivity.jl's high level interface allows for specifying a sensitivity algorithm (sensealg
) to control the method by which solve
is differentiated in an automatic differentiation (AD) context by a compatible AD library. The underlying algorithms then use the direct interface methods, like ODEForwardSensitivityProblem
and adjoint_sensitivities
, to compute the derivatives without requiring the user to do any of the setup.
Current AD libraries whose calls are captured by the sensitivity system are:
Using and Controlling Sensitivity Algorithms within AD
Take for example this simple differential equation solve on Lotka-Volterra:
using DiffEqSensitivity, OrdinaryDiffEq, Zygote
function fiip(du,u,p,t)
du[1] = dx = p[1]*u[1] - p[2]*u[1]*u[2]
du[2] = dy = -p[3]*u[2] + p[4]*u[1]*u[2]
end
p = [1.5,1.0,3.0,1.0]; u0 = [1.0;1.0]
prob = ODEProblem(fiip,u0,(0.0,10.0),p)
sol = solve(prob,Tsit5())
loss(u0,p) = sum(solve(prob,Tsit5(),u0=u0,p=p,saveat=0.1))
du0,dp = Zygote.gradient(loss,u0,p)
This will compute the gradient of the loss function "sum of the values of the solution to the ODE at timepoints dt=0.1" using an adjoint method, where du0
is the derivative of the loss function with respect to the initial condition and dp
is the derivative of the loss function with respect to the parameters.
Because the gradient is calculated by Zygote.gradient
and Zygote.jl is one of the compatible AD libraries, this derivative calculation will be captured by the sensealg
system, and one of DiffEqSensitivity.jl's adjoint overloads will be used to compute the derivative. By default, if the sensealg
keyword argument is not defined, then a smart polyalgorithm is used to automatically determine the most appropriate method for a given equation.
Likewise, the sensealg
argument can be given to directly control the method by which the derivative is computed. For example:
loss(u0,p) = sum(solve(prob,Tsit5(),u0=u0,p=p,saveat=0.1))
du0,dp = Zygote.gradient(loss,u0,p)
Choosing a Sensitivity Algorithm
There are two classes of algorithms: the continuous sensitivity analysis methods, and the discrete sensitivity analysis methods (direct automatic differentiation). Generally:
- Continuous sensitivity analysis are more efficient while the discrete sensitivity analysis is more stable (full discussion is in the appendix of that paper)
- Continuous sensitivity analysis methods only support a subset of equations, which currently includes:
- ODEProblem (with mass matrices for differential-algebraic equations (DAEs)
- SDEProblem
- SteadyStateProblem / NonlinearProblem
- Discrete sensitivity analysis methods only support a subset of algorithms, namely, the pure Julia solvers which are written generically.
For an analysis of which methods will be most efficient for computing the solution derivatives for a given problem, consult our analysis in this arxiv paper. A general rule of thumb is:
ForwardDiffSensitivity
is the fastest for differential equations with small numbers of parameters (<100) and can be used on any differential equation solver that is native Julia. If the chosen ODE solver is not compatible with direct automatic differentiation,ForwardSensitivty
may be used instead.- Adjoint senstivity analysis is the fastest when the number of parameters is sufficiently large. There are three configurations of note. Using
QuadratureAdjoint
is the fastest but uses the most memory,BacksolveAdjoint
uses the least memory but on very stiff problems it may be unstable and require a lot of checkpoints, whileInterpolatingAdjoint
is in the middle, allowing checkpointing to control total memory use. - The methods which use direct automatic differentiation (
ReverseDiffAdjoint
,TrackerAdjoint
,ForwardDiffSensitivity
, andZygoteAdjoint
) support the full range of DifferentialEquations.jl features (SDEs, DDEs, events, etc.), but only work on native Julia solvers. - For non-ODEs with large numbers of parameters,
TrackerAdjoint
in out-of-place form may be the best performer on GPUs, andReverseDiffAdjoint
TrackerAdjoint
is able to use aTrackedArray
form with out-of-place functionsdu = f(u,p,t)
but requires anArray{TrackedReal}
form forf(du,u,p,t)
mutatingdu
. The latter has much more overhead, and should be avoided if possible. Thus if solving non-ODEs with lots of parameters, usingTrackerAdjoint
with an out-of-place definition may be the current best option.
Compatibility with direct automatic differentiation algorithms (ForwardDiffSensitivity
, ReverseDiffAdjoint
, etc.) can be queried using the SciMLBase.isautodifferentiable(::SciMLAlgorithm)
trait function.
If the chosen algorithm is a continuous sensitivity analysis algorithm, then an autojacvec
argument can be given for choosing how the Jacobian-vector product (J*v
) or vector-Jacobian product (J'*v
) calculation is computed. For the forward sensitivity methods, autojacvec=true
is the most efficient, though autojacvec=false
is slightly less accurate but very close in efficiency. For adjoint methods it's more complicated and dependent on the way that the user's f
function is implemented:
EnzymeVJP()
is the most efficient if it's applicable on your equation.- If your function has no branching (no if statements) but uses mutation,
ReverseDiffVJP(true)
will be the most efficient after Enzyme. OtherwiseReverseDiffVJP()
, but you may wish to proceed with eliminating mutation as without compilation enabled this can be slow. - If your on the CPU or GPU and your function is very vectorized and has no mutation, choose
ZygoteVJP()
. - Else fallback to
TrackerVJP()
if Zygote does not support the function.
Special Notes on Non-ODE Differential Equation Problems
While all of the choices are compatible with ordinary differential equations, specific notices apply to other forms:
Differential-Algebraic Equations
We note that while all 3 are compatible with index-1 DAEs via the derivation in the universal differential equations paper (note the reinitialization), we do not recommend BacksolveAdjoint
one DAEs because the stiffness inherent in these problems tends to cause major difficulties with the accuracy of the backwards solution due to reinitialization of the algebraic variables.
Stochastic Differential Equations
We note that all of the adjoints except QuadratureAdjoint
are applicable to stochastic differential equations.
Delay Differential Equations
We note that only the discretize-then-optimize methods are applicable to delay differential equations. Constant lag and variable lag delay differential equation parameters can be estimated, but the lag times themselves are unable to be estimated through these automatic differentiation techniques.
Hybrid Equations (Equations with events/callbacks) and Jump Equations
ForwardDiffSensitivity
can differentiate code with callbacks when convert_tspan=true
. ForwardSensitivity
is not compatible with hybrid equations. The shadowing methods are not compatible with callbacks. All methods based on discrete adjoint sensitivity analysis via automatic differentiation, like ReverseDiffAdjoint
, TrackerAdjoint
, or QuadratureAdjoint
are fully compatible with events. This applies to ODEs, SDEs, DAEs, and DDEs. The continuous adjoint sensitivities BacksolveAdjoint
, InterpolatingAdjoint
, and QuadratureAdjoint
are compatible with events for ODEs. BacksolveAdjoint
and InterpolatingAdjoint
can also handle events for SDEs. Use BacksolveAdjoint
if the event terminates the time evolution and several states are saved. Currently, the continuous adjoint sensitivities do not support multiple events per time point.
Manual VJPs
Note that when defining your differential equation the vjp can be manually overwritten by providing the AbstractSciMLFunction
definition with a vjp(u,p,t)
that returns a tuple f(u,p,t),v->J*v
in the form of ChainRules.jl. When this is done, the choice of ZygoteVJP
will utilize your VJP function during the internal steps of the adjoint. This is useful for models where automatic differentiation may have trouble producing optimal code. This can be paired with ModelingToolkit.jl for producing hyper-optimized, sparse, and parallel VJP functions utilizing the automated symbolic conversions.
Sensitivity Algorithms
The following algorithm choices exist for sensealg
. See the sensitivity mathematics page for more details on the definition of the methods.
DiffEqSensitivity.ForwardSensitivity
— TypeForwardSensitivity{CS,AD,FDT} <: AbstractForwardSensitivityAlgorithm{CS,AD,FDT}
An implementation of continuous forward sensitivity analysis for propagating derivatives by solving the extended ODE. When used within adjoint differentiation (i.e. via Zygote), this will cause forward differentiation of the solve
call within the reverse-mode automatic differentiation environment.
Constructor
function ForwardSensitivity(;
chunk_size=0,autodiff=true,
diff_type=Val{:central},
autojacvec=autodiff,
autojacmat=false)
Keyword Arguments
autodiff
: Use automatic differentiation in the internal sensitivity algorithm computations. Default istrue
.chunk_size
: Chunk size for forward mode differentiation if full Jacobians are built (autojacvec=false
andautodiff=true
). Default is0
for automatic choice of chunk size.autojacvec
: Calculate the Jacobian-vector product via automatic differentiation with special seeding.diff_type
: The method used by FiniteDiff.jl for constructing the Jacobian if the full Jacobian is required withautodiff=false
.
Further details:
- If
autodiff=true
andautojacvec=true
, then the one chunkJ*v
forward-mode directional derivative calculation trick is used to compute the product without constructing the Jacobian (via ForwardDiff.jl). - If
autodiff=false
andautojacvec=true
, then the numerical direction derivative trick(f(x+epsilon*v)-f(x))/epsilon
is used to computeJ*v
without constructing the Jacobian. - If
autodiff=true
andautojacvec=false
, then the Jacobian is constructed via chunked forward-mode automatic differentiation (via ForwardDiff.jl). - If
autodiff=false
andautojacvec=false
, then the Jacobian is constructed via finite differences via FiniteDiff.jl.
SciMLProblem Support
This sensealg
only supports ODEProblem
s without callbacks (events).
DiffEqSensitivity.ForwardDiffSensitivity
— TypeForwardDiffSensitivity{CS,CTS} <: AbstractForwardSensitivityAlgorithm{CS,Nothing,Nothing}
An implementation of discrete forward sensitivity analysis through ForwardDiff.jl. When used within adjoint differentiation (i.e. via Zygote), this will cause forward differentiation of the solve
call within the reverse-mode automatic differentiation environment.
Constructor
ForwardDiffSensitivity(;chunk_size=0,convert_tspan=nothing)
Keyword Arguments
chunk_size
: the chunk size used by ForwardDiff for computing the Jacobian, i.e. the number of simultaneous columns computed.convert_tspan
: whether to convert time to also beDual
valued. By default this isnothing
which will only convert if callbacks are found. Conversion is required in order to accurately differentiate callbacks (hybrid equations).
SciMLProblem Support
This sensealg
supports any SciMLProblem
s, provided that the solver algorithms is SciMLBase.isautodifferentiable
. Note that ForwardDiffSensitivity
can accurately differentiate code with callbacks only when convert_tspan=true
.
DiffEqSensitivity.BacksolveAdjoint
— TypeBacksolveAdjoint{CS,AD,FDT,VJP} <: AbstractAdjointSensitivityAlgorithm{CS,AD,FDT}
An implementation of adjoint sensitivity analysis using a backwards solution of the ODE. By default this algorithm will use the values from the forward pass to perturb the backwards solution to the correct spot, allowing reduced memory (O(1) memory). Checkpointing stabilization is included for additional numerical stability over the naive implementation.
Constructor
BacksolveAdjoint(;chunk_size=0,autodiff=true,
diff_type=Val{:central},
autojacvec=nothing,
checkpointing=true, noisemixing=false)
Keyword Arguments
autodiff
: Use automatic differentiation for constructing the Jacobian if the Jacobian needs to be constructed. Defaults totrue
.chunk_size
: Chunk size for forward-mode differentiation if full Jacobians are built (autojacvec=false
andautodiff=true
). Default is0
for automatic choice of chunk size.diff_type
: The method used by FiniteDiff.jl for constructing the Jacobian if the full Jacobian is required withautodiff=false
.autojacvec
: Calculate the vector-Jacobian product (J'*v
) via automatic differentiation with special seeding. The default istrue
. The total set of choices are:false
: the Jacobian is constructed via FiniteDiff.jltrue
: the Jacobian is constructed via ForwardDiff.jlTrackerVJP
: Uses Tracker.jl for the vjp.ZygoteVJP
: Uses Zygote.jl for the vjp.EnzymeVJP
: Uses Enzyme.jl for the vjp.ReverseDiffVJP(compile=false)
: Uses ReverseDiff.jl for the vjp.compile
is a boolean for whether to precompile the tape, which should only be done if there are no branches (if
orwhile
statements) in thef
function.
checkpointing
: whether checkpointing is enabled for the reverse pass. Defaults totrue
.noisemixing
: Handle noise processes that are not of the formdu[i] = f(u[i])
. For example, to compute the sensitivities of an SDE with diagonal diffusion
correctly,function g_mixing!(du,u,p,t) du[1] = p[3]*u[1] + p[4]*u[2] du[2] = p[3]*u[1] + p[4]*u[2] nothing end
noisemixing=true
must be enabled. The default isfalse
.
For more details on the vjp choices, please consult the sensitivity algorithms documentation page or the docstrings of the vjp types.
Applicability of Backsolve and Caution
When BacksolveAdjoint
is applicable, it is a fast method and requires the least memory. However, one must be cautious because not all ODEs are stable under backwards integration by the majority of ODE solvers. An example of such an equation is the Lorenz equation. Notice that if one solves the Lorenz equation forward and then in reverse with any adaptive time step and non-reversible integrator, then the backwards solution diverges from the forward solution. As a quick demonstration:
using Sundials
function lorenz(du,u,p,t)
du[1] = 10.0*(u[2]-u[1])
du[2] = u[1]*(28.0-u[3]) - u[2]
du[3] = u[1]*u[2] - (8/3)*u[3]
end
u0 = [1.0;0.0;0.0]
tspan = (0.0,100.0)
prob = ODEProblem(lorenz,u0,tspan)
sol = solve(prob,Tsit5(),reltol=1e-12,abstol=1e-12)
prob2 = ODEProblem(lorenz,sol[end],(100.0,0.0))
sol = solve(prob,Tsit5(),reltol=1e-12,abstol=1e-12)
@show sol[end]-u0 #[-3.22091, -1.49394, 21.3435]
Thus one should check the stability of the backsolve on their type of problem before enabling this method. Additionally, using checkpointing with backsolve can be a low memory way to stabilize it.
For more details on this topic, see Stiff Neural Ordinary Differential Equations.
Checkpointing
To improve the numerical stability of the reverse pass, BacksolveAdjoint
includes a checkpointing feature. If sol.u
is a time series, then whenever a time sol.t
is hit while reversing, a callback will replace the reversing ODE portion with sol.u[i]
. This nudges the solution back onto the appropriate trajectory and reduces the numerical caused by drift.
SciMLProblem Support
This sensealg
only supports ODEProblem
s, SDEProblem
s, and RODEProblem
s. This sensealg
supports callback functions (events).
References
ODE: Rackauckas, C. and Ma, Y. and Martensen, J. and Warner, C. and Zubov, K. and Supekar, R. and Skinner, D. and Ramadhana, A. and Edelman, A., Universal Differential Equations for Scientific Machine Learning, arXiv:2001.04385
Hindmarsh, A. C. and Brown, P. N. and Grant, K. E. and Lee, S. L. and Serban, R. and Shumaker, D. E. and Woodward, C. S., SUNDIALS: Suite of nonlinear and differential/algebraic equation solvers, ACM Transactions on Mathematical Software (TOMS), 31, pp:363–396 (2005)
Chen, R.T.Q. and Rubanova, Y. and Bettencourt, J. and Duvenaud, D. K., Neural ordinary differential equations. In Advances in neural information processing systems, pp. 6571–6583 (2018)
Pontryagin, L. S. and Mishchenko, E.F. and Boltyanskii, V.G. and Gamkrelidze, R.V. The mathematical theory of optimal processes. Routledge, (1962)
Rackauckas, C. and Ma, Y. and Dixit, V. and Guo, X. and Innes, M. and Revels, J. and Nyberg, J. and Ivaturi, V., A comparison of automatic differentiation and continuous sensitivity analysis for derivatives of differential equation solutions, arXiv:1812.01892
DAE: Cao, Y. and Li, S. and Petzold, L. and Serban, R., Adjoint sensitivity analysis for differential-algebraic equations: The adjoint DAE system and its numerical solution, SIAM journal on scientific computing 24 pp: 1076-1089 (2003)
SDE: Gobet, E. and Munos, R., Sensitivity Analysis Using Ito-Malliavin Calculus and Martingales, and Application to Stochastic Optimal Control, SIAM Journal on control and optimization, 43, pp. 1676-1713 (2005)
Li, X. and Wong, T.-K. L.and Chen, R. T. Q. and Duvenaud, D., Scalable Gradients for Stochastic Differential Equations, PMLR 108, pp. 3870-3882 (2020), http://proceedings.mlr.press/v108/li20i.html
DiffEqSensitivity.InterpolatingAdjoint
— TypeInterpolatingAdjoint{CS,AD,FDT,VJP} <: AbstractAdjointSensitivityAlgorithm{CS,AD,FDT}
An implementation of adjoint sensitivity analysis which uses the interpolation of the forward solution for the reverse solve vector-Jacobian products. By default it requires a dense solution of the forward pass and will internally ignore saving arguments during the gradient calculation. When checkpointing is enabled it will only require the memory to interpolate between checkpoints.
Constructor
function InterpolatingAdjoint(;chunk_size=0,autodiff=true,
diff_type=Val{:central},
autojacvec=nothing,
checkpointing=false, noisemixing=false)
Keyword Arguments
autodiff
: Use automatic differentiation for constructing the Jacobian if the Jacobian needs to be constructed. Defaults totrue
.chunk_size
: Chunk size for forward-mode differentiation if full Jacobians are built (autojacvec=false
andautodiff=true
). Default is0
for automatic choice of chunk size.diff_type
: The method used by FiniteDiff.jl for constructing the Jacobian if the full Jacobian is required withautodiff=false
.autojacvec
: Calculate the vector-Jacobian product (J'*v
) via automatic differentiation with special seeding. The default istrue
. The total set of choices are:false
: the Jacobian is constructed via FiniteDiff.jltrue
: the Jacobian is constructed via ForwardDiff.jlTrackerVJP
: Uses Tracker.jl for the vjp.ZygoteVJP
: Uses Zygote.jl for the vjp.EnzymeVJP
: Uses Enzyme.jl for the vjp.ReverseDiffVJP(compile=false)
: Uses ReverseDiff.jl for the vjp.compile
is a boolean for whether to precompile the tape, which should only be done if there are no branches (if
orwhile
statements) in thef
function.
checkpointing
: whether checkpointing is enabled for the reverse pass. Defaults totrue
.noisemixing
: Handle noise processes that are not of the formdu[i] = f(u[i])
. For example, to compute the sensitivities of an SDE with diagonal diffusion
correctly,function g_mixing!(du,u,p,t) du[1] = p[3]*u[1] + p[4]*u[2] du[2] = p[3]*u[1] + p[4]*u[2] nothing end
noisemixing=true
must be enabled. The default isfalse
.
For more details on the vjp choices, please consult the sensitivity algorithms documentation page or the docstrings of the vjp types.
Checkpointing
To reduce the memory usage of the reverse pass, InterpolatingAdjoint
includes a checkpointing feature. If sol
is dense
, checkpointing is ignored and the continuous solution is used for calculating u(t)
at arbitrary time points. If checkpointing=true
and sol
is not dense
, then dense intervals between sol.t[i]
and sol.t[i+1]
are reconstructed on-demand for calculating u(t)
at arbitrary time points. This reduces the total memory requirement to only the cost of holding the dense solution over the largest time interval (in terms of number of required steps). The total compute cost is no more than double the original forward compute cost.
SciMLProblem Support
This sensealg
only supports ODEProblem
s, SDEProblem
s, and RODEProblem
s. This sensealg
supports callbacks (events).
References
Rackauckas, C. and Ma, Y. and Martensen, J. and Warner, C. and Zubov, K. and Supekar, R. and Skinner, D. and Ramadhana, A. and Edelman, A., Universal Differential Equations for Scientific Machine Learning, arXiv:2001.04385
Hindmarsh, A. C. and Brown, P. N. and Grant, K. E. and Lee, S. L. and Serban, R. and Shumaker, D. E. and Woodward, C. S., SUNDIALS: Suite of nonlinear and differential/algebraic equation solvers, ACM Transactions on Mathematical Software (TOMS), 31, pp:363–396 (2005)
Rackauckas, C. and Ma, Y. and Dixit, V. and Guo, X. and Innes, M. and Revels, J. and Nyberg, J. and Ivaturi, V., A comparison of automatic differentiation and continuous sensitivity analysis for derivatives of differential equation solutions, arXiv:1812.01892
DiffEqSensitivity.QuadratureAdjoint
— TypeQuadratureAdjoint{CS,AD,FDT,VJP} <: AbstractAdjointSensitivityAlgorithm{CS,AD,FDT}
An implementation of adjoint sensitivity analysis which develops a full continuous solution of the reverse solve in order to perform a post-ODE quadrature. This method requires the the dense solution and will ignore saving arguments during the gradient calculation. The tolerances in the constructor control the inner quadrature. The inner quadrature uses a ReverseDiff vjp if autojacvec, and compile=false
by default but can compile the tape under the same circumstances as ReverseDiffVJP
.
This method is O(n^3 + p) for stiff / implicit equations (as opposed to the O((n+p)^3) scaling of BacksolveAdjoint and InterpolatingAdjoint), and thus is much more compute efficient. However, it requires holding a dense reverse pass and is thus memory intensive.
Constructor
function QuadratureAdjoint(;chunk_size=0,autodiff=true,
diff_type=Val{:central},
autojacvec=nothing,abstol=1e-6,
reltol=1e-3,compile=false)
Keyword Arguments
autodiff
: Use automatic differentiation for constructing the Jacobian if the Jacobian needs to be constructed. Defaults totrue
.chunk_size
: Chunk size for forward-mode differentiation if full Jacobians are built (autojacvec=false
andautodiff=true
). Default is0
for automatic choice of chunk size.diff_type
: The method used by FiniteDiff.jl for constructing the Jacobian if the full Jacobian is required withautodiff=false
.autojacvec
: Calculate the vector-Jacobian product (J'*v
) via automatic differentiation with special seeding. The default istrue
. The total set of choices are:false
: the Jacobian is constructed via FiniteDiff.jltrue
: the Jacobian is constructed via ForwardDiff.jlTrackerVJP
: Uses Tracker.jl for the vjp.ZygoteVJP
: Uses Zygote.jl for the vjp.EnzymeVJP
: Uses Enzyme.jl for the vjp.ReverseDiffVJP(compile=false)
: Uses ReverseDiff.jl for the vjp.compile
is a boolean for whether to precompile the tape, which should only be done if there are no branches (if
orwhile
statements) in thef
function.
abstol
: absolute tolerance for the quadrature calculationreltol
: relative tolerance for the quadrature calculationcompile
: whether to compile the vjp calculation for the integrand calculation. SeeReverseDiffVJP
for more details.
For more details on the vjp choices, please consult the sensitivity algorithms documentation page or the docstrings of the vjp types.
SciMLProblem Support
This sensealg
only supports ODEProblem
s. This sensealg
supports events (callbacks).
References
Rackauckas, C. and Ma, Y. and Martensen, J. and Warner, C. and Zubov, K. and Supekar, R. and Skinner, D. and Ramadhana, A. and Edelman, A., Universal Differential Equations for Scientific Machine Learning, arXiv:2001.04385
Hindmarsh, A. C. and Brown, P. N. and Grant, K. E. and Lee, S. L. and Serban, R. and Shumaker, D. E. and Woodward, C. S., SUNDIALS: Suite of nonlinear and differential/algebraic equation solvers, ACM Transactions on Mathematical Software (TOMS), 31, pp:363–396 (2005)
Rackauckas, C. and Ma, Y. and Dixit, V. and Guo, X. and Innes, M. and Revels, J. and Nyberg, J. and Ivaturi, V., A comparison of automatic differentiation and continuous sensitivity analysis for derivatives of differential equation solutions, arXiv:1812.01892
Kim, S., Ji, W., Deng, S., Ma, Y., & Rackauckas, C. (2021). Stiff neural ordinary differential equations. Chaos: An Interdisciplinary Journal of Nonlinear Science, 31(9), 093122.
DiffEqSensitivity.ReverseDiffAdjoint
— TypeReverseDiffAdjoint <: AbstractAdjointSensitivityAlgorithm{nothing,true,nothing}
An implementation of discrete adjoint sensitivity analysis using the ReverseDiff.jl tracing-based AD. Supports in-place functions through an Array of Structs formulation, and supports out of place through struct of arrays.
Constructor
ReverseDiffAdjoint()
SciMLProblem Support
This sensealg
supports any DEProblem
if the algorithm is SciMLBase.isautodifferentiable
. Requires that the state variables are CPU-based Array
types.
DiffEqSensitivity.TrackerAdjoint
— TypeTrackerAdjoint <: AbstractAdjointSensitivityAlgorithm{nothing,true,nothing}
An implementation of discrete adjoint sensitivity analysis using the Tracker.jl tracing-based AD. Supports in-place functions through an Array of Structs formulation, and supports out of place through struct of arrays.
Constructor
TrackerAdjoint()
SciMLProblem Support
This sensealg
supports any DEProblem
if the algorithm is SciMLBase.isautodifferentiable
Compatible with a limited subset of AbstractArray
types for u0
, including CuArrays
.
TrackerAdjoint is incompatible with Stiff ODE solvers using forward-mode automatic differentiation for the Jacobians. Thus for example, TRBDF2()
will error. Instead, use autodiff=false
, i.e. TRBDF2(autodiff=false)
. This will only remove the forward-mode automatic differentiation of the Jacobian construction, not the reverse-mode AD usage, and thus performance will still be nearly the same, though Jacobian accuracy may suffer which could cause more steps to be required.
DiffEqSensitivity.ZygoteAdjoint
— TypeZygoteAdjoint <: AbstractAdjointSensitivityAlgorithm{nothing,true,nothing}
An implementation of discrete adjoint sensitivity analysis using the Zygote.jl source-to-source AD directly on the differential equation solver.
Constructor
ZygoteAdjoint()
SciMLProblem Support
Currently fails on almost every solver.
DiffEqSensitivity.ForwardLSS
— TypeForwardLSS{CS,AD,FDT,RType,gType} <: AbstractShadowingSensitivityAlgorithm{CS,AD,FDT}
An implementation of the discrete, forward-mode least squares shadowing (LSS) method. LSS replaces the ill-conditioned initial value probem (ODEProblem
) for chaotic systems by a well-conditioned least-squares problem. This allows for computing sensitivities of long-time averaged quantities with respect to the parameters of the ODEProblem
. The computational cost of LSS scales as (number of states x number of time steps). Converges to the correct sensitivity at a rate of T^(-1/2)
, where T
is the time of the trajectory. See NILSS()
and NILSAS()
for a more efficient non-intrusive formulation.
Constructor
ForwardLSS(;
chunk_size=0,autodiff=true,
diff_type=Val{:central},
LSSregularizer=TimeDilation(10.0,0.0,0.0),
g=nothing)
Keyword Arguments
autodiff
: Use automatic differentiation for constructing the Jacobian if the Jacobian needs to be constructed. Defaults totrue
.chunk_size
: Chunk size for forward-mode differentiation if full Jacobians are built (autojacvec=false
andautodiff=true
). Default is0
for automatic choice of chunk size.diff_type
: The method used by FiniteDiff.jl for constructing the Jacobian if the full Jacobian is required withautodiff=false
.LSSregularizer
: UsingLSSregularizer
, one can choose between three different regularization routines. The default choice isTimeDilation(10.0,0.0,0.0)
.CosWindowing()
: cos windowing of the time grid, i.e. the time grid (saved time steps) is transformed using a cosine.Cos2Windowing()
: cos^2 windowing of the time grid.TimeDilation(alpha::Number,t0skip::Number,t1skip::Number)
: Corresponds to a time dilation.alpha
controls the weight.t0skip
andt1skip
indicate the times truncated at the beginnning and end of the trajectory, respectively.
g
: instantaneous objective function of the long-time averaged objective.
SciMLProblem Support
This sensealg
only supports ODEProblem
s. This sensealg
does not support events (callbacks). This sensealg
assumes that the objective is a long-time averaged quantity and ergodic, i.e. the time evolution of the system behaves qualitatively the same over infinite time independent of the specified initial conditions, such that only the sensitivity with respect to the parameters is of interest.
References
Wang, Q., Hu, R., and Blonigan, P. Least squares shadowing sensitivity analysis of chaotic limit cycle oscillations. Journal of Computational Physics, 267, 210-224 (2014).
Wang, Q., Convergence of the Least Squares Shadowing Method for Computing Derivative of Ergodic Averages, SIAM Journal on Numerical Analysis, 52, 156–170 (2014).
Blonigan, P., Gomez, S., Wang, Q., Least Squares Shadowing for sensitivity analysis of turbulent fluid flows, in: 52nd Aerospace Sciences Meeting, 1–24 (2014).
DiffEqSensitivity.AdjointLSS
— TypeAdjointLSS{CS,AD,FDT,RType,gType} <: AbstractShadowingSensitivityAlgorithm{CS,AD,FDT}
An implementation of the discrete, adjoint-mode least square shadowing method. LSS replaces the ill-conditioned initial value probem (ODEProblem
) for chaotic systems by a well-conditioned least-squares problem. This allows for computing sensitivities of long-time averaged quantities with respect to the parameters of the ODEProblem
. The computational cost of LSS scales as (number of states x number of time steps). Converges to the correct sensitivity at a rate of T^(-1/2)
, where T
is the time of the trajectory. See NILSS()
and NILSAS()
for a more efficient non-intrusive formulation.
Constructor
AdjointLSS(;
chunk_size=0,autodiff=true,
diff_type=Val{:central},
LSSRegularizer=CosWindowing(),
g=nothing)
Keyword Arguments
autodiff
: Use automatic differentiation for constructing the Jacobian if the Jacobian needs to be constructed. Defaults totrue
.chunk_size
: Chunk size for forward-mode differentiation if full Jacobians are built (autojacvec=false
andautodiff=true
). Default is0
for automatic choice of chunk size.diff_type
: The method used by FiniteDiff.jl for constructing the Jacobian if the full Jacobian is required withautodiff=false
.LSSregularizer
: UsingLSSregularizer
, one can choose between different regularization routines. The default choice isTimeDilation(10.0,0.0,0.0)
.TimeDilation(alpha::Number,t0skip::Number,t1skip::Number)
: Corresponds to a time dilation.alpha
controls the weight.t0skip
andt1skip
indicate the times truncated at the beginnning and end of the trajectory, respectively. The default value fort0skip
andt1skip
iszero(alpha)
.
g
: instantaneous objective function of the long-time averaged objective.
SciMLProblem Support
This sensealg
only supports ODEProblem
s. This sensealg
does not support events (callbacks). This sensealg
assumes that the objective is a long-time averaged quantity and ergodic, i.e. the time evolution of the system behaves qualitatively the same over infinite time independent of the specified initial conditions, such that only the sensitivity with respect to the parameters is of interest.
References
Wang, Q., Hu, R., and Blonigan, P. Least squares shadowing sensitivity analysis of chaotic limit cycle oscillations. Journal of Computational Physics, 267, 210-224 (2014).
DiffEqSensitivity.NILSS
— Typestruct NILSS{CS,AD,FDT,RNG,nType,gType} <: AbstractShadowingSensitivityAlgorithm{CS,AD,FDT}
An implementation of the forward-mode, continuous non-intrusive least squares shadowing method. NILSS
allows for computing sensitivities of long-time averaged quantities with respect to the parameters of an ODEProblem
by constraining the computation to the unstable subspace. NILSS
employs the continuous-time ForwardSensitivity
method as tangent solver. To avoid an exponential blow-up of the (homogenous and inhomogenous) tangent solutions, the trajectory should be divided into sufficiently small segments, where the tangent solutions are rescaled on the interfaces. The computational and memory cost of NILSS scale with the number of unstable (positive) Lyapunov exponents (instead of the number of states as in the LSS method). NILSS
avoids the explicit construction of the Jacobian at each time step and thus should generally be preferred (for large system sizes) over ForwardLSS
.
Constructor
NILSS(nseg, nstep; nus = nothing,
rng = Xorshifts.Xoroshiro128Plus(rand(UInt64)),
chunk_size=0,autodiff=true,
diff_type=Val{:central},
autojacvec=autodiff,
g=nothing)
Arguments
nseg
: Number of segments on full time interval on the attractor.nstep
: number of steps on each segment.
Keyword Arguments
nus
: Dimension of the unstable subspace. Default isnothing
.nus
must be smaller or equal to the state dimension (length(u0)
). With the default choice,nus = length(u0) - 1
will be set at compile time.rng
: (Pseudo) random number generator. Used for initializing the homogenous tangent states (w
). Default isXorshifts.Xoroshiro128Plus(rand(UInt64))
.autodiff
: Use automatic differentiation in the internal sensitivity algorithm computations. Default istrue
.chunk_size
: Chunk size for forward mode differentiation if full Jacobians are built (autojacvec=false
andautodiff=true
). Default is0
for automatic choice of chunk size.autojacvec
: Calculate the Jacobian-vector product via automatic differentiation with special seeding.diff_type
: The method used by FiniteDiff.jl for constructing the Jacobian if the full Jacobian is required withautodiff=false
.g
: instantaneous objective function of the long-time averaged objective.
SciMLProblem Support
This sensealg
only supports ODEProblem
s. This sensealg
does not support events (callbacks). This sensealg
assumes that the objective is a long-time averaged quantity and ergodic, i.e. the time evolution of the system behaves qualitatively the same over infinite time independent of the specified initial conditions, such that only the sensitivity with respect to the parameters is of interest.
References
Ni, A., Blonigan, P. J., Chater, M., Wang, Q., Zhang, Z., Sensitivity analy- sis on chaotic dynamical system by Non-Intrusive Least Square Shadowing (NI-LSS), in: 46th AIAA Fluid Dynamics Conference, AIAA AVIATION Forum (AIAA 2016-4399), American Institute of Aeronautics and Astronautics, 1–16 (2016).
Ni, A., and Wang, Q. Sensitivity analysis on chaotic dynamical systems by Non-Intrusive Least Squares Shadowing (NILSS). Journal of Computational Physics 347, 56-77 (2017).
DiffEqSensitivity.NILSAS
— TypeNILSAS{CS,AD,FDT,RNG,SENSE,gType} <: AbstractShadowingSensitivityAlgorithm{CS,AD,FDT}
An implementation of the adjoint-mode, continuous non-intrusive adjoint least squares shadowing method. NILSAS
allows for computing sensitivities of long-time averaged quantities with respect to the parameters of an ODEProblem
by constraining the computation to the unstable subspace. NILSAS
employs SciMLSensitivity.jl's continuous adjoint sensitivity methods on each segment to compute (homogenous and inhomogenous) adjoint solutions. To avoid an exponential blow-up of the adjoint solutions, the trajectory should be divided into sufficiently small segments, where the adjoint solutions are rescaled on the interfaces. The computational and memory cost of NILSAS scale with the number of unstable, adjoint Lyapunov exponents (instead of the number of states as in the LSS method). NILSAS
avoids the explicit construction of the Jacobian at each time step and thus should generally be preferred (for large system sizes) over AdjointLSS
. NILSAS
is favourable over NILSS
for many parameters because NILSAS computes the gradient with respect to multiple parameters with negligible additional cost.
Constructor
NILSAS(nseg, nstep, M=nothing; rng = Xorshifts.Xoroshiro128Plus(rand(UInt64)),
adjoint_sensealg = BacksolveAdjoint(autojacvec=ReverseDiffVJP()),
chunk_size=0,autodiff=true,
diff_type=Val{:central},
g=nothing
)
Arguments
nseg
: Number of segments on full time interval on the attractor.nstep
: number of steps on each segment.M
: number of homogenous adjoint solutions. This number must be bigger or equal than the number of (positive, adjoint) Lyapunov exponents. Default isnothing
.
Keyword Arguments
rng
: (Pseudo) random number generator. Used for initializing the terminate conditions of the homogenous adjoint states (w
). Default isXorshifts.Xoroshiro128Plus(rand(UInt64))
.adjoint_sensealg
: Continuous adjoint sensitivity method to compute homogenous and inhomogenous adjoint solutions on each segment. Default isBacksolveAdjoint(autojacvec=ReverseDiffVJP())
.autojacvec
: Calculate the vector-Jacobian product (J'*v
) via automatic
true
. The total set of choices are:false
: the Jacobian is constructed via FiniteDiff.jltrue
: the Jacobian is constructed via ForwardDiff.jlTrackerVJP
: Uses Tracker.jl for the vjp.ZygoteVJP
: Uses Zygote.jl for the vjp.EnzymeVJP
: Uses Enzyme.jl for the vjp.ReverseDiffVJP(compile=false)
: Uses ReverseDiff.jl for the vjp.compile
is a boolean for whether to precompile the tape, which should only be done if there are no branches (if
orwhile
statements) in thef
function.
autodiff
: Use automatic differentiation for constructing the Jacobian if the Jacobian needs to be constructed. Defaults totrue
.chunk_size
: Chunk size for forward-mode differentiation if full Jacobians are built (autojacvec=false
andautodiff=true
). Default is0
for automatic choice of chunk size.diff_type
: The method used by FiniteDiff.jl for constructing the Jacobian if the full Jacobian is required withautodiff=false
.g
: instantaneous objective function of the long-time averaged objective.
SciMLProblem Support
This sensealg
only supports ODEProblem
s. This sensealg
does not support events (callbacks). This sensealg
assumes that the objective is a long-time averaged quantity and ergodic, i.e. the time evolution of the system behaves qualitatively the same over infinite time independent of the specified initial conditions, such that only the sensitivity with respect to the parameters is of interest.
References
Ni, A., and Talnikar, C., Adjoint sensitivity analysis on chaotic dynamical systems by Non-Intrusive Least Squares Adjoint Shadowing (NILSAS). Journal of Computational Physics 395, 690-709 (2019).
Vector-Jacobian Product (VJP) Choices
DiffEqSensitivity.ZygoteVJP
— TypeZygoteVJP <: VJPChoice
Uses Zygote.jl to compute vector-Jacobian products. Tends to be the fastest VJP method if the ODE/DAE/SDE/DDE is written with mostly vectorized functions (like neural networks and other layers from Flux.jl) and the f
functions is given out-of-place. If the f
function is in-place, then Zygote.Buffer
arrays are used internally which can greatly reduce the performance of the VJP method.
Constructor
ZygoteVJP(compile=false)
DiffEqSensitivity.EnzymeVJP
— TypeEnzymeVJP <: VJPChoice
Uses Enzyme.jl to compute vector-Jacobian products. Is the fastest VJP whenever applicable, though Enzyme.jl currently has low coverage over the Julia programming language, for example restricting the user's defined f
function to not do things like require garbage collection or calls to BLAS/LAPACK. However, mutation is supported, meaning that in-place f
with fully mutating non-allocating code will work with Enzyme (provided no high level calls to C like BLAS/LAPACK are used) and this will be the most efficient adjoint implementation.
Constructor
EnzymeVJP(compile=false)
DiffEqSensitivity.TrackerVJP
— TypeTrackerVJP <: VJPChoice
Uses Tracker.jl to compute the vector-Jacobian products. If f
is in-place, then it uses a array of structs formulation to do scalarized reverse mode, while if f
is out-of-place then it uses an array-based reverse mode.
Not as efficient as ReverseDiffVJP
, but supports GPUs when doing array-based reverse mode.
Constructor
TrackerVJP(compile=false)
DiffEqSensitivity.ReverseDiffVJP
— TypeReverseDiffVJP{compile} <: VJPChoice
Uses ReverseDiff.jl to compute the vector-Jacobian products. If f
is in-place, then it uses a array of structs formulation to do scalarized reverse mode, while if f
is out-of-place then it uses an array-based reverse mode.
Usually the fastest when scalarized operations exist in the f function (like in scientific machine learning applications like Universal Differential Equations) and the boolean compilation is enabled (i.e. ReverseDiffVJP(true)), if EnzymeVJP fails on a given choice of f
.
Does not support GPUs (CuArrays).
Constructor
ReverseDiffVJP(compile=false)
Keyword Arguments
compile
: Whether to cache the compilation of the reverse tape. This heavily increases the performance of the method but requires that thef
function of the ODE/DAE/SDE/DDE has no branching.
More Details on Sensitivity Algorithm Choices
The following section describes a bit more details to consider when choosing a sensitivity algorithm.
Optimize-then-Discretize
The original neural ODE paper popularized optimize-then-discretize with O(1) adjoints via backsolve. This is the methodology BacksolveAdjoint
When training non-stiff neural ODEs, BacksolveAdjoint
with ZygoteVJP
is generally the fastest method. Additionally, this method does not require storing the values of any intermediate points and is thus the most memory efficient. However, BacksolveAdjoint
is prone to instabilities whenever the Lipschitz constant is sufficiently large, like in stiff equations, PDE discretizations, and many other contexts, so it is not used by default. When training a neural ODE for machine learning applications, the user should try BacksolveAdjoint
and see if it is sufficiently accurate on their problem. More details on this topic can be found in Stiff Neural Ordinary Differential Equations
Note that DiffEqFlux's implementation of BacksolveAdjoint
includes an extra feature BacksolveAdjoint(checkpointing=true)
which mixes checkpointing with BacksolveAdjoint
. What this method does is that, at saveat
points, values from the forward pass are saved. Since the reverse solve should numerically be the same as the forward pass, issues with divergence of the reverse pass are mitigated by restarting the reverse pass at the saveat
value from the forward pass. This reduces the divergence and can lead to better gradients at the cost of higher memory usage due to having to save some values of the forward pass. This can stabilize the adjoint in some applications, but for highly stiff applications the divergence can be too fast for this to work in practice.
To avoid the issues of backwards solving the ODE, InterpolatingAdjoint
and QuadratureAdjoint
utilize information from the forward pass. By default these methods utilize the continuous solution provided by DifferentialEquations.jl in the calculations of the adjoint pass. QuadratureAdjoint
uses this to build a continuous function for the solution of adjoint equation and then performs an adaptive quadrature via Quadrature.jl, while InterpolatingAdjoint
appends the integrand to the ODE so it's computed simultaneously to the Lagrange multiplier. When memory is not an issue, we find that the QuadratureAdjoint
approach tends to be the most efficient as it has a significantly smaller adjoint differential equation and the quadrature converges very fast, but this form requires holding the full continuous solution of the adjoint which can be a significant burden for large parameter problems. The InterpolatingAdjoint
is thus a compromise between memory efficiency and compute efficiency, and is in the same spirit as CVODES.
However, if the memory cost of the InterpolatingAdjoint
is too high, checkpointing can be used via InterpolatingAdjoint(checkpointing=true)
. When this is used, the checkpoints default to sol.t
of the forward pass (i.e. the saved timepoints usually set by saveat
). Then in the adjoint, intervals of sol.t[i-1]
to sol.t[i]
are re-solved in order to obtain a short interpolation which can be utilized in the adjoints. This at most results in two full solves of the forward pass, but dramatically reduces the computational cost while being a low-memory format. This is the preferred method for highly stiff equations when memory is an issue, i.e. stiff PDEs or large neural DAEs.
For forward-mode, the ForwardSensitivty
is the version that performs the optimize-then-discretize approach. In this case, autojacvec
corresponds to the method for computing J*v
within the forward sensitivity equations, which is either true
or false
for whether to use Jacobian-free forward-mode AD (via ForwardDiff.jl) or Jacobian-free numerical differentiation.
Discretize-then-Optimize
In this approach the discretization is done first and then optimization is done on the discretized system. While traditionally this can be done discrete sensitivity analysis, this is can be equivalently done by automatic differentiation on the solver itself. ReverseDiffAdjoint
performs reverse-mode automatic differentiation on the solver via ReverseDiff.jl, ZygoteAdjoint
performs reverse-mode automatic differentiation on the solver via Zygote.jl, and TrackerAdjoint
performs reverse-mode automatic differentiation on the solver via Tracker.jl. In addition, ForwardDiffSensitivty
performs forward-mode automatic differentiation on the solver via ForwardDiff.jl.
We note that many studies have suggested that this approach produces more accurate gradients than the optimize-than-discretize approach