More options

Specifying gradient options

Function VI allows the user to obtain a Gaussian approximation with minimal requirements. The user only needs to code a function logp that implements the log-posterior, provide an initial starting point x₀ and call:

# log-posterior is a Gaussian with zero mean and unit covariance.
# Hence, our approximation should be exact in this example.
logp(x) = -sum(x.*x) / 2

# implicitly specifies that the log-posterior is 5-dimensional
x₀ = randn(5)

# obtain approximation
q, logev = VI(logp, x₀, S = 200, iterations = 10_000, show_every = 200)

# Check that mean is close to zero and covariance close to identity.
# mean and cov are re-exported function from Distributions.jl
mean(q)
cov(q)

However, providing a gradient for logp can speed up the computation in VI.

➤ Gradient free mode

Specify by gradientmode = :gradientfree.

If no options relating to the gradient are specified, i.e. none of the options gradientmode or gradlogp is specified, VI will by default use internally the Optim.NelderMead optimiser that does not need a gradient.

The user can explicitly specify that the algorithm should use the gradient free Optim.NelderMead optimisation algorithm by setting gradientmode = :gradientfree.

➤ Automatic differentiation mode

Specify by gradientmode = :forward.

If logp is coding a differentiable function, then its gradient can be conveniently computed using automatic differentiation. By specifying gradientmode = :forward, function VI will internally use ForwardDiff to calculate the gradient of logp. In this case, VI will use internally the Optim.LBFGS optimiser.

q, logev = VI(logp, x₀, S = 200, iterations = 30, show_every = 1, gradientmode = :forward)

We note that with the use of gradientmode = :forward we arrive in fewer iterations to a result than in the gradient free case.

➤ Gradient provided

Specify by gradientmode = :provided.

The user can provide a gradient for logp via the gralogp option:

# Let us calculate the gradient explicitly
gradlogp(x) = -x

q, logev = VI(logp, x₀, gradlogp = gradlogp, S = 200, iterations = 30, show_every = 1, gradientmode = :provided)

In this case, VI will use internally the Optim.LBFGS optimiser. Again in this case we arrive in fewer iterations to a result than in the gradient free case.

Note

Even if a gradient has been explicitly provided via the gralogl option, the user still needs to specify gradientmode = :provided to instruct VI to use the provided gradient.

Evaluating the lower bound on test samples

The options S specifies the number of samples to use when approximating the expected lower bound, see Technical description. The higher the value we use for S, the better the approximation will be, however, at a higher computational cost. The lower the value we use for S, the faster the computation will be, but the approximation may be poorer. Hence, when setting S we need to take this trade-off into account.

Function VI offers a mechanism that informs us whether the value S is set to a high enough value. This mechanism makes use of two options, namely Stest and test_every. Option Stest defines the number of test samplesused exclusively for evaluating (not optimising!) the Kullback-Leibler divergence every test_every number of iterations. Monitoring the Kullback-Leibler divergence in this way offers an effective way of detecting whether S has been set sufficiently high.

Function VI will report test_every iterations the value of ....