# Sensitivity Algorithms for Differential Equations with Automatic Differentiation (AD)

DiffEqSensitivity.jl's high level interface allows for specifying a sensitivity algorithm (`sensealg`

) to control the method by which `solve`

is differentiated in an automatic differentiation (AD) context by a compatible AD library. The underlying algorithms then use the direct interface methods, like `ODEForwardSensitivityProblem`

and `adjoint_sensitivities`

, to compute the derivatives without requiring the user to do any of the setup.

Current AD libraries whose calls are captured by the sensitivity system are:

## Using and Controlling Sensitivity Algorithms within AD

Take for example this simple differential equation solve on Lotka-Volterra:

```
using DiffEqSensitivity, OrdinaryDiffEq, Zygote
function fiip(du,u,p,t)
du[1] = dx = p[1]*u[1] - p[2]*u[1]*u[2]
du[2] = dy = -p[3]*u[2] + p[4]*u[1]*u[2]
end
p = [1.5,1.0,3.0,1.0]; u0 = [1.0;1.0]
prob = ODEProblem(fiip,u0,(0.0,10.0),p)
sol = solve(prob,Tsit5())
loss(u0,p) = sum(solve(prob,Tsit5(),u0=u0,p=p,saveat=0.1))
du0,dp = Zygote.gradient(loss,u0,p)
```

This will compute the gradient of the loss function "sum of the values of the solution to the ODE at timepoints dt=0.1" using an adjoint method, where `du0`

is the derivative of the loss function with respect to the initial condition and `dp`

is the derivative of the loss function with respect to the parameters.

Because the gradient is calculated by `Zygote.gradient`

and Zygote.jl is one of the compatible AD libraries, this derivative calculation will be captured by the `sensealg`

system, and one of DiffEqSensitivity.jl's adjoint overloads will be used to compute the derivative. By default, if the `sensealg`

keyword argument is not defined, then a smart polyalgorithm is used to automatically determine the most appropriate method for a given equation.

Likewise, the `sensealg`

argument can be given to directly control the method by which the derivative is computed. For example:

```
loss(u0,p) = sum(solve(prob,Tsit5(),u0=u0,p=p,saveat=0.1))
du0,dp = Zygote.gradient(loss,u0,p)
```

## Choosing a Sensitivity Algorithm

There are two classes of algorithms: the continuous sensitivity analysis methods, and the discrete sensitivity analysis methods (direct automatic differentiation). Generally:

- Continuous sensitivity analysis are more efficient while the discrete sensitivity analysis is more stable (full discussion is in the appendix of that paper)
- Continuous sensitivity analysis methods only support a subset of equations, which currently includes:
- ODEProblem (with mass matrices for differential-algebraic equations (DAEs)
- SDEProblem
- SteadyStateProblem / NonlinearProblem

- Discrete sensitivity analysis methods only support a subset of algorithms, namely, the pure Julia solvers which are written generically.

For an analysis of which methods will be most efficient for computing the solution derivatives for a given problem, consult our analysis in this arxiv paper. A general rule of thumb is:

`ForwardDiffSensitivity`

is the fastest for differential equations with small numbers of parameters (<100) and can be used on any differential equation solver that is native Julia. If the chosen ODE solver is not compatible with direct automatic differentiation,`ForwardSensitivty`

may be used instead.- Adjoint senstivity analysis is the fastest when the number of parameters is sufficiently large. There are three configurations of note. Using
`QuadratureAdjoint`

is the fastest but uses the most memory,`BacksolveAdjoint`

uses the least memory but on very stiff problems it may be unstable and require a lot of checkpoints, while`InterpolatingAdjoint`

is in the middle, allowing checkpointing to control total memory use. - The methods which use direct automatic differentiation (
`ReverseDiffAdjoint`

,`TrackerAdjoint`

,`ForwardDiffSensitivity`

, and`ZygoteAdjoint`

) support the full range of DifferentialEquations.jl features (SDEs, DDEs, events, etc.), but only work on native Julia solvers. - For non-ODEs with large numbers of parameters,
`TrackerAdjoint`

in out-of-place form may be the best performer on GPUs, and`ReverseDiffAdjoint`

`TrackerAdjoint`

is able to use a`TrackedArray`

form with out-of-place functions`du = f(u,p,t)`

but requires an`Array{TrackedReal}`

form for`f(du,u,p,t)`

mutating`du`

. The latter has much more overhead, and should be avoided if possible. Thus if solving non-ODEs with lots of parameters, using`TrackerAdjoint`

with an out-of-place definition may be the current best option.

Compatibility with direct automatic differentiation algorithms (`ForwardDiffSensitivity`

, `ReverseDiffAdjoint`

, etc.) can be queried using the `SciMLBase.isautodifferentiable(::SciMLAlgorithm)`

trait function.

If the chosen algorithm is a continuous sensitivity analysis algorithm, then an `autojacvec`

argument can be given for choosing how the Jacobian-vector product (`J*v`

) or vector-Jacobian product (`J'*v`

) calculation is computed. For the forward sensitivity methods, `autojacvec=true`

is the most efficient, though `autojacvec=false`

is slightly less accurate but very close in efficiency. For adjoint methods it's more complicated and dependent on the way that the user's `f`

function is implemented:

`EnzymeVJP()`

is the most efficient if it's applicable on your equation.- If your function has no branching (no if statements) but uses mutation,
`ReverseDiffVJP(true)`

will be the most efficient after Enzyme. Otherwise`ReverseDiffVJP()`

, but you may wish to proceed with eliminating mutation as without compilation enabled this can be slow. - If your on the CPU or GPU and your function is very vectorized and has no mutation, choose
`ZygoteVJP()`

. - Else fallback to
`TrackerVJP()`

if Zygote does not support the function.

## Special Notes on Non-ODE Differential Equation Problems

While all of the choices are compatible with ordinary differential equations, specific notices apply to other forms:

### Differential-Algebraic Equations

We note that while all 3 are compatible with index-1 DAEs via the derivation in the universal differential equations paper (note the reinitialization), we do not recommend `BacksolveAdjoint`

one DAEs because the stiffness inherent in these problems tends to cause major difficulties with the accuracy of the backwards solution due to reinitialization of the algebraic variables.

### Stochastic Differential Equations

We note that all of the adjoints except `QuadratureAdjoint`

are applicable to stochastic differential equations.

### Delay Differential Equations

We note that only the discretize-then-optimize methods are applicable to delay differential equations. Constant lag and variable lag delay differential equation parameters can be estimated, but the lag times themselves are unable to be estimated through these automatic differentiation techniques.

### Hybrid Equations (Equations with events/callbacks) and Jump Equations

`ForwardDiffSensitivity`

can differentiate code with callbacks when `convert_tspan=true`

. `ForwardSensitivity`

is not compatible with hybrid equations. The shadowing methods are not compatible with callbacks. All methods based on discrete adjoint sensitivity analysis via automatic differentiation, like `ReverseDiffAdjoint`

, `TrackerAdjoint`

, or `QuadratureAdjoint`

are fully compatible with events. This applies to ODEs, SDEs, DAEs, and DDEs. The continuous adjoint sensitivities `BacksolveAdjoint`

, `InterpolatingAdjoint`

, and `QuadratureAdjoint`

are compatible with events for ODEs. `BacksolveAdjoint`

and `InterpolatingAdjoint`

can also handle events for SDEs. Use `BacksolveAdjoint`

if the event terminates the time evolution and several states are saved. Currently, the continuous adjoint sensitivities do not support multiple events per time point.

## Manual VJPs

Note that when defining your differential equation the vjp can be manually overwritten by providing the `AbstractSciMLFunction`

definition with a `vjp(u,p,t)`

that returns a tuple `f(u,p,t),v->J*v`

in the form of ChainRules.jl. When this is done, the choice of `ZygoteVJP`

will utilize your VJP function during the internal steps of the adjoint. This is useful for models where automatic differentiation may have trouble producing optimal code. This can be paired with ModelingToolkit.jl for producing hyper-optimized, sparse, and parallel VJP functions utilizing the automated symbolic conversions.

## Sensitivity Algorithms

The following algorithm choices exist for `sensealg`

. See the sensitivity mathematics page for more details on the definition of the methods.

`DiffEqSensitivity.ForwardSensitivity`

— TypeForwardSensitivity{CS,AD,FDT} <: AbstractForwardSensitivityAlgorithm{CS,AD,FDT}

An implementation of continuous forward sensitivity analysis for propagating derivatives by solving the extended ODE. When used within adjoint differentiation (i.e. via Zygote), this will cause forward differentiation of the `solve`

call within the reverse-mode automatic differentiation environment.

**Constructor**

```
function ForwardSensitivity(;
chunk_size=0,autodiff=true,
diff_type=Val{:central},
autojacvec=autodiff,
autojacmat=false)
```

**Keyword Arguments**

`autodiff`

: Use automatic differentiation in the internal sensitivity algorithm computations. Default is`true`

.`chunk_size`

: Chunk size for forward mode differentiation if full Jacobians are built (`autojacvec=false`

and`autodiff=true`

). Default is`0`

for automatic choice of chunk size.`autojacvec`

: Calculate the Jacobian-vector product via automatic differentiation with special seeding.`diff_type`

: The method used by FiniteDiff.jl for constructing the Jacobian if the full Jacobian is required with`autodiff=false`

.

Further details:

- If
`autodiff=true`

and`autojacvec=true`

, then the one chunk`J*v`

forward-mode directional derivative calculation trick is used to compute the product without constructing the Jacobian (via ForwardDiff.jl). - If
`autodiff=false`

and`autojacvec=true`

, then the numerical direction derivative trick`(f(x+epsilon*v)-f(x))/epsilon`

is used to compute`J*v`

without constructing the Jacobian. - If
`autodiff=true`

and`autojacvec=false`

, then the Jacobian is constructed via chunked forward-mode automatic differentiation (via ForwardDiff.jl). - If
`autodiff=false`

and`autojacvec=false`

, then the Jacobian is constructed via finite differences via FiniteDiff.jl.

**SciMLProblem Support**

This `sensealg`

only supports `ODEProblem`

s without callbacks (events).

`DiffEqSensitivity.ForwardDiffSensitivity`

— TypeForwardDiffSensitivity{CS,CTS} <: AbstractForwardSensitivityAlgorithm{CS,Nothing,Nothing}

An implementation of discrete forward sensitivity analysis through ForwardDiff.jl. When used within adjoint differentiation (i.e. via Zygote), this will cause forward differentiation of the `solve`

call within the reverse-mode automatic differentiation environment.

**Constructor**

`ForwardDiffSensitivity(;chunk_size=0,convert_tspan=nothing)`

**Keyword Arguments**

`chunk_size`

: the chunk size used by ForwardDiff for computing the Jacobian, i.e. the number of simultaneous columns computed.`convert_tspan`

: whether to convert time to also be`Dual`

valued. By default this is`nothing`

which will only convert if callbacks are found. Conversion is required in order to accurately differentiate callbacks (hybrid equations).

**SciMLProblem Support**

This `sensealg`

supports any `SciMLProblem`

s, provided that the solver algorithms is `SciMLBase.isautodifferentiable`

. Note that `ForwardDiffSensitivity`

can accurately differentiate code with callbacks only when `convert_tspan=true`

.

`DiffEqSensitivity.BacksolveAdjoint`

— TypeBacksolveAdjoint{CS,AD,FDT,VJP} <: AbstractAdjointSensitivityAlgorithm{CS,AD,FDT}

An implementation of adjoint sensitivity analysis using a backwards solution of the ODE. By default this algorithm will use the values from the forward pass to perturb the backwards solution to the correct spot, allowing reduced memory (O(1) memory). Checkpointing stabilization is included for additional numerical stability over the naive implementation.

**Constructor**

```
BacksolveAdjoint(;chunk_size=0,autodiff=true,
diff_type=Val{:central},
autojacvec=nothing,
checkpointing=true, noisemixing=false)
```

**Keyword Arguments**

`autodiff`

: Use automatic differentiation for constructing the Jacobian if the Jacobian needs to be constructed. Defaults to`true`

.`chunk_size`

: Chunk size for forward-mode differentiation if full Jacobians are built (`autojacvec=false`

and`autodiff=true`

). Default is`0`

for automatic choice of chunk size.`diff_type`

: The method used by FiniteDiff.jl for constructing the Jacobian if the full Jacobian is required with`autodiff=false`

.`autojacvec`

: Calculate the vector-Jacobian product (`J'*v`

) via automatic differentiation with special seeding. The default is`true`

. The total set of choices are:`false`

: the Jacobian is constructed via FiniteDiff.jl`true`

: the Jacobian is constructed via ForwardDiff.jl`TrackerVJP`

: Uses Tracker.jl for the vjp.`ZygoteVJP`

: Uses Zygote.jl for the vjp.`EnzymeVJP`

: Uses Enzyme.jl for the vjp.`ReverseDiffVJP(compile=false)`

: Uses ReverseDiff.jl for the vjp.`compile`

is a boolean for whether to precompile the tape, which should only be done if there are no branches (`if`

or`while`

statements) in the`f`

function.

`checkpointing`

: whether checkpointing is enabled for the reverse pass. Defaults to`true`

.`noisemixing`

: Handle noise processes that are not of the form`du[i] = f(u[i])`

. For example, to compute the sensitivities of an SDE with diagonal diffusion

correctly,`function g_mixing!(du,u,p,t) du[1] = p[3]*u[1] + p[4]*u[2] du[2] = p[3]*u[1] + p[4]*u[2] nothing end`

`noisemixing=true`

must be enabled. The default is`false`

.

For more details on the vjp choices, please consult the sensitivity algorithms documentation page or the docstrings of the vjp types.

**Applicability of Backsolve and Caution**

When `BacksolveAdjoint`

is applicable, it is a fast method and requires the least memory. However, one must be cautious because not all ODEs are stable under backwards integration by the majority of ODE solvers. An example of such an equation is the Lorenz equation. Notice that if one solves the Lorenz equation forward and then in reverse with any adaptive time step and non-reversible integrator, then the backwards solution diverges from the forward solution. As a quick demonstration:

```
using Sundials
function lorenz(du,u,p,t)
du[1] = 10.0*(u[2]-u[1])
du[2] = u[1]*(28.0-u[3]) - u[2]
du[3] = u[1]*u[2] - (8/3)*u[3]
end
u0 = [1.0;0.0;0.0]
tspan = (0.0,100.0)
prob = ODEProblem(lorenz,u0,tspan)
sol = solve(prob,Tsit5(),reltol=1e-12,abstol=1e-12)
prob2 = ODEProblem(lorenz,sol[end],(100.0,0.0))
sol = solve(prob,Tsit5(),reltol=1e-12,abstol=1e-12)
@show sol[end]-u0 #[-3.22091, -1.49394, 21.3435]
```

Thus one should check the stability of the backsolve on their type of problem before enabling this method. Additionally, using checkpointing with backsolve can be a low memory way to stabilize it.

For more details on this topic, see Stiff Neural Ordinary Differential Equations.

**Checkpointing**

To improve the numerical stability of the reverse pass, `BacksolveAdjoint`

includes a checkpointing feature. If `sol.u`

is a time series, then whenever a time `sol.t`

is hit while reversing, a callback will replace the reversing ODE portion with `sol.u[i]`

. This nudges the solution back onto the appropriate trajectory and reduces the numerical caused by drift.

**SciMLProblem Support**

This `sensealg`

only supports `ODEProblem`

s, `SDEProblem`

s, and `RODEProblem`

s. This `sensealg`

supports callback functions (events).

**References**

ODE: Rackauckas, C. and Ma, Y. and Martensen, J. and Warner, C. and Zubov, K. and Supekar, R. and Skinner, D. and Ramadhana, A. and Edelman, A., Universal Differential Equations for Scientific Machine Learning, arXiv:2001.04385

Hindmarsh, A. C. and Brown, P. N. and Grant, K. E. and Lee, S. L. and Serban, R. and Shumaker, D. E. and Woodward, C. S., SUNDIALS: Suite of nonlinear and differential/algebraic equation solvers, ACM Transactions on Mathematical Software (TOMS), 31, pp:363–396 (2005)

Chen, R.T.Q. and Rubanova, Y. and Bettencourt, J. and Duvenaud, D. K., Neural ordinary differential equations. In Advances in neural information processing systems, pp. 6571–6583 (2018)

Pontryagin, L. S. and Mishchenko, E.F. and Boltyanskii, V.G. and Gamkrelidze, R.V. The mathematical theory of optimal processes. Routledge, (1962)

Rackauckas, C. and Ma, Y. and Dixit, V. and Guo, X. and Innes, M. and Revels, J. and Nyberg, J. and Ivaturi, V., A comparison of automatic differentiation and continuous sensitivity analysis for derivatives of differential equation solutions, arXiv:1812.01892

DAE: Cao, Y. and Li, S. and Petzold, L. and Serban, R., Adjoint sensitivity analysis for differential-algebraic equations: The adjoint DAE system and its numerical solution, SIAM journal on scientific computing 24 pp: 1076-1089 (2003)

SDE: Gobet, E. and Munos, R., Sensitivity Analysis Using Ito-Malliavin Calculus and Martingales, and Application to Stochastic Optimal Control, SIAM Journal on control and optimization, 43, pp. 1676-1713 (2005)

Li, X. and Wong, T.-K. L.and Chen, R. T. Q. and Duvenaud, D., Scalable Gradients for Stochastic Differential Equations, PMLR 108, pp. 3870-3882 (2020), http://proceedings.mlr.press/v108/li20i.html

`DiffEqSensitivity.InterpolatingAdjoint`

— TypeInterpolatingAdjoint{CS,AD,FDT,VJP} <: AbstractAdjointSensitivityAlgorithm{CS,AD,FDT}

An implementation of adjoint sensitivity analysis which uses the interpolation of the forward solution for the reverse solve vector-Jacobian products. By default it requires a dense solution of the forward pass and will internally ignore saving arguments during the gradient calculation. When checkpointing is enabled it will only require the memory to interpolate between checkpoints.

**Constructor**

```
function InterpolatingAdjoint(;chunk_size=0,autodiff=true,
diff_type=Val{:central},
autojacvec=nothing,
checkpointing=false, noisemixing=false)
```

**Keyword Arguments**

`autodiff`

: Use automatic differentiation for constructing the Jacobian if the Jacobian needs to be constructed. Defaults to`true`

.`chunk_size`

: Chunk size for forward-mode differentiation if full Jacobians are built (`autojacvec=false`

and`autodiff=true`

). Default is`0`

for automatic choice of chunk size.`diff_type`

: The method used by FiniteDiff.jl for constructing the Jacobian if the full Jacobian is required with`autodiff=false`

.`autojacvec`

: Calculate the vector-Jacobian product (`J'*v`

) via automatic differentiation with special seeding. The default is`true`

. The total set of choices are:`false`

: the Jacobian is constructed via FiniteDiff.jl`true`

: the Jacobian is constructed via ForwardDiff.jl`TrackerVJP`

: Uses Tracker.jl for the vjp.`ZygoteVJP`

: Uses Zygote.jl for the vjp.`EnzymeVJP`

: Uses Enzyme.jl for the vjp.`ReverseDiffVJP(compile=false)`

: Uses ReverseDiff.jl for the vjp.`compile`

is a boolean for whether to precompile the tape, which should only be done if there are no branches (`if`

or`while`

statements) in the`f`

function.

`checkpointing`

: whether checkpointing is enabled for the reverse pass. Defaults to`true`

.`noisemixing`

: Handle noise processes that are not of the form`du[i] = f(u[i])`

. For example, to compute the sensitivities of an SDE with diagonal diffusion

correctly,`function g_mixing!(du,u,p,t) du[1] = p[3]*u[1] + p[4]*u[2] du[2] = p[3]*u[1] + p[4]*u[2] nothing end`

`noisemixing=true`

must be enabled. The default is`false`

.

For more details on the vjp choices, please consult the sensitivity algorithms documentation page or the docstrings of the vjp types.

**Checkpointing**

To reduce the memory usage of the reverse pass, `InterpolatingAdjoint`

includes a checkpointing feature. If `sol`

is `dense`

, checkpointing is ignored and the continuous solution is used for calculating `u(t)`

at arbitrary time points. If `checkpointing=true`

and `sol`

is not `dense`

, then dense intervals between `sol.t[i]`

and `sol.t[i+1]`

are reconstructed on-demand for calculating `u(t)`

at arbitrary time points. This reduces the total memory requirement to only the cost of holding the dense solution over the largest time interval (in terms of number of required steps). The total compute cost is no more than double the original forward compute cost.

**SciMLProblem Support**

This `sensealg`

only supports `ODEProblem`

s, `SDEProblem`

s, and `RODEProblem`

s. This `sensealg`

supports callbacks (events).

**References**

Rackauckas, C. and Ma, Y. and Martensen, J. and Warner, C. and Zubov, K. and Supekar, R. and Skinner, D. and Ramadhana, A. and Edelman, A., Universal Differential Equations for Scientific Machine Learning, arXiv:2001.04385

Hindmarsh, A. C. and Brown, P. N. and Grant, K. E. and Lee, S. L. and Serban, R. and Shumaker, D. E. and Woodward, C. S., SUNDIALS: Suite of nonlinear and differential/algebraic equation solvers, ACM Transactions on Mathematical Software (TOMS), 31, pp:363–396 (2005)

Rackauckas, C. and Ma, Y. and Dixit, V. and Guo, X. and Innes, M. and Revels, J. and Nyberg, J. and Ivaturi, V., A comparison of automatic differentiation and continuous sensitivity analysis for derivatives of differential equation solutions, arXiv:1812.01892

`DiffEqSensitivity.QuadratureAdjoint`

— TypeQuadratureAdjoint{CS,AD,FDT,VJP} <: AbstractAdjointSensitivityAlgorithm{CS,AD,FDT}

An implementation of adjoint sensitivity analysis which develops a full continuous solution of the reverse solve in order to perform a post-ODE quadrature. This method requires the the dense solution and will ignore saving arguments during the gradient calculation. The tolerances in the constructor control the inner quadrature. The inner quadrature uses a ReverseDiff vjp if autojacvec, and `compile=false`

by default but can compile the tape under the same circumstances as `ReverseDiffVJP`

.

This method is O(n^3 + p) for stiff / implicit equations (as opposed to the O((n+p)^3) scaling of BacksolveAdjoint and InterpolatingAdjoint), and thus is much more compute efficient. However, it requires holding a dense reverse pass and is thus memory intensive.

**Constructor**

```
function QuadratureAdjoint(;chunk_size=0,autodiff=true,
diff_type=Val{:central},
autojacvec=nothing,abstol=1e-6,
reltol=1e-3,compile=false)
```

**Keyword Arguments**

`autodiff`

: Use automatic differentiation for constructing the Jacobian if the Jacobian needs to be constructed. Defaults to`true`

.`chunk_size`

: Chunk size for forward-mode differentiation if full Jacobians are built (`autojacvec=false`

and`autodiff=true`

). Default is`0`

for automatic choice of chunk size.`diff_type`

: The method used by FiniteDiff.jl for constructing the Jacobian if the full Jacobian is required with`autodiff=false`

.`autojacvec`

: Calculate the vector-Jacobian product (`J'*v`

) via automatic differentiation with special seeding. The default is`true`

. The total set of choices are:`false`

: the Jacobian is constructed via FiniteDiff.jl`true`

: the Jacobian is constructed via ForwardDiff.jl`TrackerVJP`

: Uses Tracker.jl for the vjp.`ZygoteVJP`

: Uses Zygote.jl for the vjp.`EnzymeVJP`

: Uses Enzyme.jl for the vjp.`ReverseDiffVJP(compile=false)`

: Uses ReverseDiff.jl for the vjp.`compile`

is a boolean for whether to precompile the tape, which should only be done if there are no branches (`if`

or`while`

statements) in the`f`

function.

`abstol`

: absolute tolerance for the quadrature calculation`reltol`

: relative tolerance for the quadrature calculation`compile`

: whether to compile the vjp calculation for the integrand calculation. See`ReverseDiffVJP`

for more details.

For more details on the vjp choices, please consult the sensitivity algorithms documentation page or the docstrings of the vjp types.

**SciMLProblem Support**

This `sensealg`

only supports `ODEProblem`

s. This `sensealg`

supports events (callbacks).

**References**

Rackauckas, C. and Ma, Y. and Martensen, J. and Warner, C. and Zubov, K. and Supekar, R. and Skinner, D. and Ramadhana, A. and Edelman, A., Universal Differential Equations for Scientific Machine Learning, arXiv:2001.04385

Hindmarsh, A. C. and Brown, P. N. and Grant, K. E. and Lee, S. L. and Serban, R. and Shumaker, D. E. and Woodward, C. S., SUNDIALS: Suite of nonlinear and differential/algebraic equation solvers, ACM Transactions on Mathematical Software (TOMS), 31, pp:363–396 (2005)

Rackauckas, C. and Ma, Y. and Dixit, V. and Guo, X. and Innes, M. and Revels, J. and Nyberg, J. and Ivaturi, V., A comparison of automatic differentiation and continuous sensitivity analysis for derivatives of differential equation solutions, arXiv:1812.01892

Kim, S., Ji, W., Deng, S., Ma, Y., & Rackauckas, C. (2021). Stiff neural ordinary differential equations. Chaos: An Interdisciplinary Journal of Nonlinear Science, 31(9), 093122.

`DiffEqSensitivity.ReverseDiffAdjoint`

— TypeReverseDiffAdjoint <: AbstractAdjointSensitivityAlgorithm{nothing,true,nothing}

An implementation of discrete adjoint sensitivity analysis using the ReverseDiff.jl tracing-based AD. Supports in-place functions through an Array of Structs formulation, and supports out of place through struct of arrays.

**Constructor**

`ReverseDiffAdjoint()`

**SciMLProblem Support**

This `sensealg`

supports any `DEProblem`

if the algorithm is `SciMLBase.isautodifferentiable`

. Requires that the state variables are CPU-based `Array`

types.

`DiffEqSensitivity.TrackerAdjoint`

— TypeTrackerAdjoint <: AbstractAdjointSensitivityAlgorithm{nothing,true,nothing}

An implementation of discrete adjoint sensitivity analysis using the Tracker.jl tracing-based AD. Supports in-place functions through an Array of Structs formulation, and supports out of place through struct of arrays.

**Constructor**

`TrackerAdjoint()`

**SciMLProblem Support**

This `sensealg`

supports any `DEProblem`

if the algorithm is `SciMLBase.isautodifferentiable`

Compatible with a limited subset of `AbstractArray`

types for `u0`

, including `CuArrays`

.

TrackerAdjoint is incompatible with Stiff ODE solvers using forward-mode automatic differentiation for the Jacobians. Thus for example, `TRBDF2()`

will error. Instead, use `autodiff=false`

, i.e. `TRBDF2(autodiff=false)`

. This will only remove the forward-mode automatic differentiation of the Jacobian construction, not the reverse-mode AD usage, and thus performance will still be nearly the same, though Jacobian accuracy may suffer which could cause more steps to be required.

`DiffEqSensitivity.ZygoteAdjoint`

— TypeZygoteAdjoint <: AbstractAdjointSensitivityAlgorithm{nothing,true,nothing}

An implementation of discrete adjoint sensitivity analysis using the Zygote.jl source-to-source AD directly on the differential equation solver.

**Constructor**

`ZygoteAdjoint()`

**SciMLProblem Support**

Currently fails on almost every solver.

`DiffEqSensitivity.ForwardLSS`

— TypeForwardLSS{CS,AD,FDT,RType,gType} <: AbstractShadowingSensitivityAlgorithm{CS,AD,FDT}

An implementation of the discrete, forward-mode least squares shadowing (LSS) method. LSS replaces the ill-conditioned initial value probem (`ODEProblem`

) for chaotic systems by a well-conditioned least-squares problem. This allows for computing sensitivities of long-time averaged quantities with respect to the parameters of the `ODEProblem`

. The computational cost of LSS scales as (number of states x number of time steps). Converges to the correct sensitivity at a rate of `T^(-1/2)`

, where `T`

is the time of the trajectory. See `NILSS()`

and `NILSAS()`

for a more efficient non-intrusive formulation.

**Constructor**

```
ForwardLSS(;
chunk_size=0,autodiff=true,
diff_type=Val{:central},
LSSregularizer=TimeDilation(10.0,0.0,0.0),
g=nothing)
```

**Keyword Arguments**

`autodiff`

: Use automatic differentiation for constructing the Jacobian if the Jacobian needs to be constructed. Defaults to`true`

.`chunk_size`

: Chunk size for forward-mode differentiation if full Jacobians are built (`autojacvec=false`

and`autodiff=true`

). Default is`0`

for automatic choice of chunk size.`diff_type`

: The method used by FiniteDiff.jl for constructing the Jacobian if the full Jacobian is required with`autodiff=false`

.`LSSregularizer`

: Using`LSSregularizer`

, one can choose between three different regularization routines. The default choice is`TimeDilation(10.0,0.0,0.0)`

.`CosWindowing()`

: cos windowing of the time grid, i.e. the time grid (saved time steps) is transformed using a cosine.`Cos2Windowing()`

: cos^2 windowing of the time grid.`TimeDilation(alpha::Number,t0skip::Number,t1skip::Number)`

: Corresponds to a time dilation.`alpha`

controls the weight.`t0skip`

and`t1skip`

indicate the times truncated at the beginnning and end of the trajectory, respectively.

`g`

: instantaneous objective function of the long-time averaged objective.

**SciMLProblem Support**

This `sensealg`

only supports `ODEProblem`

s. This `sensealg`

does not support events (callbacks). This `sensealg`

assumes that the objective is a long-time averaged quantity and ergodic, i.e. the time evolution of the system behaves qualitatively the same over infinite time independent of the specified initial conditions, such that only the sensitivity with respect to the parameters is of interest.

**References**

Wang, Q., Hu, R., and Blonigan, P. Least squares shadowing sensitivity analysis of chaotic limit cycle oscillations. Journal of Computational Physics, 267, 210-224 (2014).

Wang, Q., Convergence of the Least Squares Shadowing Method for Computing Derivative of Ergodic Averages, SIAM Journal on Numerical Analysis, 52, 156–170 (2014).

Blonigan, P., Gomez, S., Wang, Q., Least Squares Shadowing for sensitivity analysis of turbulent fluid flows, in: 52nd Aerospace Sciences Meeting, 1–24 (2014).

`DiffEqSensitivity.AdjointLSS`

— TypeAdjointLSS{CS,AD,FDT,RType,gType} <: AbstractShadowingSensitivityAlgorithm{CS,AD,FDT}

An implementation of the discrete, adjoint-mode least square shadowing method. LSS replaces the ill-conditioned initial value probem (`ODEProblem`

) for chaotic systems by a well-conditioned least-squares problem. This allows for computing sensitivities of long-time averaged quantities with respect to the parameters of the `ODEProblem`

. The computational cost of LSS scales as (number of states x number of time steps). Converges to the correct sensitivity at a rate of `T^(-1/2)`

, where `T`

is the time of the trajectory. See `NILSS()`

and `NILSAS()`

for a more efficient non-intrusive formulation.

**Constructor**

```
AdjointLSS(;
chunk_size=0,autodiff=true,
diff_type=Val{:central},
LSSRegularizer=CosWindowing(),
g=nothing)
```

**Keyword Arguments**

`autodiff`

: Use automatic differentiation for constructing the Jacobian if the Jacobian needs to be constructed. Defaults to`true`

.`chunk_size`

: Chunk size for forward-mode differentiation if full Jacobians are built (`autojacvec=false`

and`autodiff=true`

). Default is`0`

for automatic choice of chunk size.`diff_type`

: The method used by FiniteDiff.jl for constructing the Jacobian if the full Jacobian is required with`autodiff=false`

.`LSSregularizer`

: Using`LSSregularizer`

, one can choose between different regularization routines. The default choice is`TimeDilation(10.0,0.0,0.0)`

.`TimeDilation(alpha::Number,t0skip::Number,t1skip::Number)`

: Corresponds to a time dilation.`alpha`

controls the weight.`t0skip`

and`t1skip`

indicate the times truncated at the beginnning and end of the trajectory, respectively. The default value for`t0skip`

and`t1skip`

is`zero(alpha)`

.

`g`

: instantaneous objective function of the long-time averaged objective.

**SciMLProblem Support**

This `sensealg`

only supports `ODEProblem`

s. This `sensealg`

does not support events (callbacks). This `sensealg`

assumes that the objective is a long-time averaged quantity and ergodic, i.e. the time evolution of the system behaves qualitatively the same over infinite time independent of the specified initial conditions, such that only the sensitivity with respect to the parameters is of interest.

**References**

Wang, Q., Hu, R., and Blonigan, P. Least squares shadowing sensitivity analysis of chaotic limit cycle oscillations. Journal of Computational Physics, 267, 210-224 (2014).

`DiffEqSensitivity.NILSS`

— Typestruct NILSS{CS,AD,FDT,RNG,nType,gType} <: AbstractShadowingSensitivityAlgorithm{CS,AD,FDT}

An implementation of the forward-mode, continuous non-intrusive least squares shadowing method. `NILSS`

allows for computing sensitivities of long-time averaged quantities with respect to the parameters of an `ODEProblem`

by constraining the computation to the unstable subspace. `NILSS`

employs the continuous-time `ForwardSensitivity`

method as tangent solver. To avoid an exponential blow-up of the (homogenous and inhomogenous) tangent solutions, the trajectory should be divided into sufficiently small segments, where the tangent solutions are rescaled on the interfaces. The computational and memory cost of NILSS scale with the number of unstable (positive) Lyapunov exponents (instead of the number of states as in the LSS method). `NILSS`

avoids the explicit construction of the Jacobian at each time step and thus should generally be preferred (for large system sizes) over `ForwardLSS`

.

**Constructor**

```
NILSS(nseg, nstep; nus = nothing,
rng = Xorshifts.Xoroshiro128Plus(rand(UInt64)),
chunk_size=0,autodiff=true,
diff_type=Val{:central},
autojacvec=autodiff,
g=nothing)
```

**Arguments**

`nseg`

: Number of segments on full time interval on the attractor.`nstep`

: number of steps on each segment.

**Keyword Arguments**

`nus`

: Dimension of the unstable subspace. Default is`nothing`

.`nus`

must be smaller or equal to the state dimension (`length(u0)`

). With the default choice,`nus = length(u0) - 1`

will be set at compile time.`rng`

: (Pseudo) random number generator. Used for initializing the homogenous tangent states (`w`

). Default is`Xorshifts.Xoroshiro128Plus(rand(UInt64))`

.`autodiff`

: Use automatic differentiation in the internal sensitivity algorithm computations. Default is`true`

.`chunk_size`

: Chunk size for forward mode differentiation if full Jacobians are built (`autojacvec=false`

and`autodiff=true`

). Default is`0`

for automatic choice of chunk size.`autojacvec`

: Calculate the Jacobian-vector product via automatic differentiation with special seeding.`diff_type`

: The method used by FiniteDiff.jl for constructing the Jacobian if the full Jacobian is required with`autodiff=false`

.`g`

: instantaneous objective function of the long-time averaged objective.

**SciMLProblem Support**

This `sensealg`

only supports `ODEProblem`

s. This `sensealg`

does not support events (callbacks). This `sensealg`

assumes that the objective is a long-time averaged quantity and ergodic, i.e. the time evolution of the system behaves qualitatively the same over infinite time independent of the specified initial conditions, such that only the sensitivity with respect to the parameters is of interest.

**References**

Ni, A., Blonigan, P. J., Chater, M., Wang, Q., Zhang, Z., Sensitivity analy- sis on chaotic dynamical system by Non-Intrusive Least Square Shadowing (NI-LSS), in: 46th AIAA Fluid Dynamics Conference, AIAA AVIATION Forum (AIAA 2016-4399), American Institute of Aeronautics and Astronautics, 1–16 (2016).

Ni, A., and Wang, Q. Sensitivity analysis on chaotic dynamical systems by Non-Intrusive Least Squares Shadowing (NILSS). Journal of Computational Physics 347, 56-77 (2017).

`DiffEqSensitivity.NILSAS`

— TypeNILSAS{CS,AD,FDT,RNG,SENSE,gType} <: AbstractShadowingSensitivityAlgorithm{CS,AD,FDT}

An implementation of the adjoint-mode, continuous non-intrusive adjoint least squares shadowing method. `NILSAS`

allows for computing sensitivities of long-time averaged quantities with respect to the parameters of an `ODEProblem`

by constraining the computation to the unstable subspace. `NILSAS`

employs SciMLSensitivity.jl's continuous adjoint sensitivity methods on each segment to compute (homogenous and inhomogenous) adjoint solutions. To avoid an exponential blow-up of the adjoint solutions, the trajectory should be divided into sufficiently small segments, where the adjoint solutions are rescaled on the interfaces. The computational and memory cost of NILSAS scale with the number of unstable, adjoint Lyapunov exponents (instead of the number of states as in the LSS method). `NILSAS`

avoids the explicit construction of the Jacobian at each time step and thus should generally be preferred (for large system sizes) over `AdjointLSS`

. `NILSAS`

is favourable over `NILSS`

for many parameters because NILSAS computes the gradient with respect to multiple parameters with negligible additional cost.

**Constructor**

```
NILSAS(nseg, nstep, M=nothing; rng = Xorshifts.Xoroshiro128Plus(rand(UInt64)),
adjoint_sensealg = BacksolveAdjoint(autojacvec=ReverseDiffVJP()),
chunk_size=0,autodiff=true,
diff_type=Val{:central},
g=nothing
)
```

**Arguments**

`nseg`

: Number of segments on full time interval on the attractor.`nstep`

: number of steps on each segment.`M`

: number of homogenous adjoint solutions. This number must be bigger or equal than the number of (positive, adjoint) Lyapunov exponents. Default is`nothing`

.

**Keyword Arguments**

`rng`

: (Pseudo) random number generator. Used for initializing the terminate conditions of the homogenous adjoint states (`w`

). Default is`Xorshifts.Xoroshiro128Plus(rand(UInt64))`

.`adjoint_sensealg`

: Continuous adjoint sensitivity method to compute homogenous and inhomogenous adjoint solutions on each segment. Default is`BacksolveAdjoint(autojacvec=ReverseDiffVJP())`

.`autojacvec`

: Calculate the vector-Jacobian product (`J'*v`

) via automatic

`true`

. The total set of choices are:`false`

: the Jacobian is constructed via FiniteDiff.jl`true`

: the Jacobian is constructed via ForwardDiff.jl`TrackerVJP`

: Uses Tracker.jl for the vjp.`ZygoteVJP`

: Uses Zygote.jl for the vjp.`EnzymeVJP`

: Uses Enzyme.jl for the vjp.`ReverseDiffVJP(compile=false)`

: Uses ReverseDiff.jl for the vjp.`compile`

is a boolean for whether to precompile the tape, which should only be done if there are no branches (`if`

or`while`

statements) in the`f`

function.

`autodiff`

: Use automatic differentiation for constructing the Jacobian if the Jacobian needs to be constructed. Defaults to`true`

.`chunk_size`

: Chunk size for forward-mode differentiation if full Jacobians are built (`autojacvec=false`

and`autodiff=true`

). Default is`0`

for automatic choice of chunk size.`diff_type`

: The method used by FiniteDiff.jl for constructing the Jacobian if the full Jacobian is required with`autodiff=false`

.`g`

: instantaneous objective function of the long-time averaged objective.

**SciMLProblem Support**

`sensealg`

only supports `ODEProblem`

s. This `sensealg`

does not support events (callbacks). This `sensealg`

assumes that the objective is a long-time averaged quantity and ergodic, i.e. the time evolution of the system behaves qualitatively the same over infinite time independent of the specified initial conditions, such that only the sensitivity with respect to the parameters is of interest.

**References**

Ni, A., and Talnikar, C., Adjoint sensitivity analysis on chaotic dynamical systems by Non-Intrusive Least Squares Adjoint Shadowing (NILSAS). Journal of Computational Physics 395, 690-709 (2019).

## Vector-Jacobian Product (VJP) Choices

`DiffEqSensitivity.ZygoteVJP`

— TypeZygoteVJP <: VJPChoice

Uses Zygote.jl to compute vector-Jacobian products. Tends to be the fastest VJP method if the ODE/DAE/SDE/DDE is written with mostly vectorized functions (like neural networks and other layers from Flux.jl) and the `f`

functions is given out-of-place. If the `f`

function is in-place, then `Zygote.Buffer`

arrays are used internally which can greatly reduce the performance of the VJP method.

**Constructor**

`ZygoteVJP(compile=false)`

`DiffEqSensitivity.EnzymeVJP`

— TypeEnzymeVJP <: VJPChoice

Uses Enzyme.jl to compute vector-Jacobian products. Is the fastest VJP whenever applicable, though Enzyme.jl currently has low coverage over the Julia programming language, for example restricting the user's defined `f`

function to not do things like require garbage collection or calls to BLAS/LAPACK. However, mutation is supported, meaning that in-place `f`

with fully mutating non-allocating code will work with Enzyme (provided no high level calls to C like BLAS/LAPACK are used) and this will be the most efficient adjoint implementation.

**Constructor**

`EnzymeVJP(compile=false)`

`DiffEqSensitivity.TrackerVJP`

— TypeTrackerVJP <: VJPChoice

Uses Tracker.jl to compute the vector-Jacobian products. If `f`

is in-place, then it uses a array of structs formulation to do scalarized reverse mode, while if `f`

is out-of-place then it uses an array-based reverse mode.

Not as efficient as `ReverseDiffVJP`

, but supports GPUs when doing array-based reverse mode.

**Constructor**

`TrackerVJP(compile=false)`

`DiffEqSensitivity.ReverseDiffVJP`

— TypeReverseDiffVJP{compile} <: VJPChoice

Uses ReverseDiff.jl to compute the vector-Jacobian products. If `f`

is in-place, then it uses a array of structs formulation to do scalarized reverse mode, while if `f`

is out-of-place then it uses an array-based reverse mode.

Usually the fastest when scalarized operations exist in the f function (like in scientific machine learning applications like Universal Differential Equations) and the boolean compilation is enabled (i.e. ReverseDiffVJP(true)), if EnzymeVJP fails on a given choice of `f`

.

Does not support GPUs (CuArrays).

**Constructor**

`ReverseDiffVJP(compile=false)`

**Keyword Arguments**

`compile`

: Whether to cache the compilation of the reverse tape. This heavily increases the performance of the method but requires that the`f`

function of the ODE/DAE/SDE/DDE has no branching.

## More Details on Sensitivity Algorithm Choices

The following section describes a bit more details to consider when choosing a sensitivity algorithm.

### Optimize-then-Discretize

The original neural ODE paper popularized optimize-then-discretize with O(1) adjoints via backsolve. This is the methodology `BacksolveAdjoint`

When training non-stiff neural ODEs, `BacksolveAdjoint`

with `ZygoteVJP`

is generally the fastest method. Additionally, this method does not require storing the values of any intermediate points and is thus the most memory efficient. However, `BacksolveAdjoint`

is prone to instabilities whenever the Lipschitz constant is sufficiently large, like in stiff equations, PDE discretizations, and many other contexts, so it is not used by default. When training a neural ODE for machine learning applications, the user should try `BacksolveAdjoint`

and see if it is sufficiently accurate on their problem. More details on this topic can be found in Stiff Neural Ordinary Differential Equations

Note that DiffEqFlux's implementation of `BacksolveAdjoint`

includes an extra feature `BacksolveAdjoint(checkpointing=true)`

which mixes checkpointing with `BacksolveAdjoint`

. What this method does is that, at `saveat`

points, values from the forward pass are saved. Since the reverse solve should numerically be the same as the forward pass, issues with divergence of the reverse pass are mitigated by restarting the reverse pass at the `saveat`

value from the forward pass. This reduces the divergence and can lead to better gradients at the cost of higher memory usage due to having to save some values of the forward pass. This can stabilize the adjoint in some applications, but for highly stiff applications the divergence can be too fast for this to work in practice.

To avoid the issues of backwards solving the ODE, `InterpolatingAdjoint`

and `QuadratureAdjoint`

utilize information from the forward pass. By default these methods utilize the continuous solution provided by DifferentialEquations.jl in the calculations of the adjoint pass. `QuadratureAdjoint`

uses this to build a continuous function for the solution of adjoint equation and then performs an adaptive quadrature via Quadrature.jl, while `InterpolatingAdjoint`

appends the integrand to the ODE so it's computed simultaneously to the Lagrange multiplier. When memory is not an issue, we find that the `QuadratureAdjoint`

approach tends to be the most efficient as it has a significantly smaller adjoint differential equation and the quadrature converges very fast, but this form requires holding the full continuous solution of the adjoint which can be a significant burden for large parameter problems. The `InterpolatingAdjoint`

is thus a compromise between memory efficiency and compute efficiency, and is in the same spirit as CVODES.

However, if the memory cost of the `InterpolatingAdjoint`

is too high, checkpointing can be used via `InterpolatingAdjoint(checkpointing=true)`

. When this is used, the checkpoints default to `sol.t`

of the forward pass (i.e. the saved timepoints usually set by `saveat`

). Then in the adjoint, intervals of `sol.t[i-1]`

to `sol.t[i]`

are re-solved in order to obtain a short interpolation which can be utilized in the adjoints. This at most results in two full solves of the forward pass, but dramatically reduces the computational cost while being a low-memory format. This is the preferred method for highly stiff equations when memory is an issue, i.e. stiff PDEs or large neural DAEs.

For forward-mode, the `ForwardSensitivty`

is the version that performs the optimize-then-discretize approach. In this case, `autojacvec`

corresponds to the method for computing `J*v`

within the forward sensitivity equations, which is either `true`

or `false`

for whether to use Jacobian-free forward-mode AD (via ForwardDiff.jl) or Jacobian-free numerical differentiation.

### Discretize-then-Optimize

In this approach the discretization is done first and then optimization is done on the discretized system. While traditionally this can be done discrete sensitivity analysis, this is can be equivalently done by automatic differentiation on the solver itself. `ReverseDiffAdjoint`

performs reverse-mode automatic differentiation on the solver via ReverseDiff.jl, `ZygoteAdjoint`

performs reverse-mode automatic differentiation on the solver via Zygote.jl, and `TrackerAdjoint`

performs reverse-mode automatic differentiation on the solver via Tracker.jl. In addition, `ForwardDiffSensitivty`

performs forward-mode automatic differentiation on the solver via ForwardDiff.jl.

We note that many studies have suggested that this approach produces more accurate gradients than the optimize-than-discretize approach