NonconvexUtils

CI Coverage

Useful hacks for use in Nonconvex.jl.

Hack #1: AbstractDiffFunction and ForwardDiffFunction

Nonconvex.jl uses Zygote.jl for automatic differentiation (AD). In order to force the use of another AD package for a function f, one can specify any AD backend from AbstractDifferentiation.jl in the following way:

g = AbstractDiffFunction(f, backend)

If you want to use ForwardDiff.jl to differentiate the function f, you can also use

g = ForwardDiffFunction(f)

which is short for to:

AbstractDiffFunction(f, AbstractDifferentiation.ForwardDiffBackend())

Hack #2: TraceFunction

Often one may want to store intermediate solutions, function values and gradients for visualisation or post-processing. This is currently not possible with Nonconvex.jl as not all solvers support a callback mechanism. To workround this, TraceFunction can be used to store input, output and optionally gradient values during the optimization:

g = TraceFunction(f; on_call = false, on_grad = true)

If the on_call keyword argument is set to true, the input and output values are stored every time the function g is called. If the on_grad keyword argument is set to true, the input, output and gradient values are stored every time the function g is differentiated with a ChainRules-compatible AD package such as Zygote.jl which is used by Nonconvex.jl. The history is stored in f.trace.

Hack #3: CustomGradFunction

Often a function f can have analytic an gradient function ∇f that is more efficient than using AD on f. The way to make use of this gradient function in Nonconvex.jl has been to define an rrule for the function f. Now the following can be used instead. This will work for scalar-valued or vector-valued functions f where ∇f is either the gradient function or Jacobian function respectively.

g = CustomGradFunction(f, ∇f)

Hack #4: CustomHessianFunction and Hessian-vector products

Similar to CustomGradFunction if a function f has a custom gradient function ∇f and a custom Hessian function ∇²f, they can be used to force Zygote to use them in the following code:

g = CustomHessianFunction(f, ∇f, ∇²f)
Zygote.gradient(f, x)
Zygote.jacobian(x -> Zygote.gradient(f, x)[1], x)

It is on the user to ensure that the custom Hessian is always a symmetric matrix.

Note that one has to use Zygote for both levels of differentiation for this to work which makes it currently impossible to use in Nonconvex.jl directly, e.g. with IPOPT, because Nonconvex.jl uses ForwardDiff.jl for the second order differentiation, but this will be fixed soon by making more use of AbstractDifferentiation when it gets a ZygoteBackend implemented.

If instead of ∇²f, you only have access to a Hessian-vector product function hvp which takes 2 inputs: x (the input to f) and v (the vector to multiply the Hessian H by), and returns H * v, you can use this as follows:

g = CustomHessianFunction(f, ∇f, hvp; hvp = true)