More options
Specifying gradient options
Function VI
allows the user to obtain a Gaussian approximation with minimal requirements. The user only needs to code a function logp
that implements the log-posterior, provide an initial starting point x₀
and call:
# log-posterior is a Gaussian with zero mean and unit covariance.
# Hence, our approximation should be exact in this example.
logp(x) = -sum(x.*x) / 2
# implicitly specifies that the log-posterior is 5-dimensional
x₀ = randn(5)
# obtain approximation
q, logev = VI(logp, x₀, S = 200, iterations = 10_000, show_every = 200)
# Check that mean is close to zero and covariance close to identity.
# mean and cov are re-exported function from Distributions.jl
mean(q)
cov(q)
However, providing a gradient for logp
can speed up the computation in VI
.
➤ Gradient free mode
Specify by gradientmode = :gradientfree
.
If no options relating to the gradient are specified, i.e. none of the options gradientmode
or gradlogp
is specified, VI
will by default use internally the Optim.NelderMead
optimiser that does not need a gradient.
The user can explicitly specify that the algorithm should use the gradient free Optim.NelderMead
optimisation algorithm by setting gradientmode = :gradientfree
.
➤ Automatic differentiation mode
Specify by gradientmode = :forward
.
If logp
is coding a differentiable function, then its gradient can be conveniently computed using automatic differentiation. By specifying gradientmode = :forward
, function VI
will internally use ForwardDiff to calculate the gradient of logp
. In this case, VI
will use internally the Optim.LBFGS
optimiser.
q, logev = VI(logp, x₀, S = 200, iterations = 30, show_every = 1, gradientmode = :forward)
We note that with the use of gradientmode = :forward
we arrive in fewer iterations to a result than in the gradient free case.
➤ Gradient provided
Specify by gradientmode = :provided
.
The user can provide a gradient for logp
via the gralogp
option:
# Let us calculate the gradient explicitly
gradlogp(x) = -x
q, logev = VI(logp, x₀, gradlogp = gradlogp, S = 200, iterations = 30, show_every = 1, gradientmode = :provided)
In this case, VI
will use internally the Optim.LBFGS
optimiser. Again in this case we arrive in fewer iterations to a result than in the gradient free case.
Even if a gradient has been explicitly provided via the gralogl
option, the user still needs to specify gradientmode = :provided
to instruct VI
to use the provided gradient.
Evaluating the lower bound on test samples
The options S
specifies the number of samples to use when approximating the expected lower bound, see Technical description. The higher the value we use for S
, the better the approximation will be, however, at a higher computational cost. The lower the value we use for S
, the faster the computation will be, but the approximation may be poorer. Hence, when setting S
we need to take this trade-off into account.
Function VI
offers a mechanism that informs us whether the value S
is set to a high enough value. This mechanism makes use of two options, namely Stest
and test_every
. Option Stest
defines the number of test samplesused exclusively for evaluating (not optimising!) the Kullback-Leibler divergence every test_every
number of iterations. Monitoring the Kullback-Leibler divergence in this way offers an effective way of detecting whether S
has been set sufficiently high.
Function VI
will report test_every
iterations the value of ....