BarkerMCMC.jl

Implements an adaptive Monte Carlo Markov Chain sampler that makes use of gradient information. It was the proposed by Livingstone et al. (2021).

The adaptative preconditioning is based on Andrieu and Thoms (2008), Algorithm 4 in Section 5. We followed the Algorithm 7.2 of the supporting information of Livingstone et al. (2021) with slight modifications.

You can find the repository with the source code here.

Usage

You can either define the log density compatible to LogDensityProblems.jl, or you provide the log density and it's gradient as two separate functions.

LogDensityProblems interface

This approach is recommend to for most modeling tasks. This example demonstrate Bayesian inference of a simple regression. Note, that we use TransformVariables.jl to ensure that the standard deviation is always positive. The package takes care of the determinate correction.

using BarkerMCMC

using TransformVariables: transform, inverse, as, asℝ, asℝ₊
using TransformedLogDensities: TransformedLogDensity
using LogDensityProblemsAD: ADgradient
using Distributions

# -----------
# simulate data

x = rand(10)
y = 2 .+ 1.5*x .+ randn(10)*0.2


# -----------
# define model

model(x, θ) = θ.a + θ.b*x

function likelihood(θ, x, y)
    ll = 0.0
    for i in eachindex(y)
        ll += logpdf(Normal(model(x[i], θ), θ.σ), y[i])
    end
    ll
end

function prior(θ)
    logpdf(Normal(0,1), θ.a) +
        logpdf(Normal(0,1), θ.b) +
        logpdf(Exponential(1), θ.σ)
end

posterior(θ, x, y) = likelihood(θ, x, y) + prior(θ)

# transformation σ to [0, ∞)
trans = as((a = asℝ, b = asℝ, σ = asℝ₊))

# -----------
# define lp

lp = TransformedLogDensity(trans, θ -> posterior(θ, x, y))

# define gradient with AD
lp = ADgradient(:ForwardDiff, lp)
# lp= ADgradient(:Zygote, lp)  # we can use different AD backends


# -----------
# sample

# we need the inits in transformed space
inits = inverse(trans, (a=0, b=2, σ=0.5))

results = barker_mcmc(lp,
                      inits;
                      n_iter = 10_000)

# back-transform samples to original parameter space
samples = [transform(trans, s)
           for s in eachrow(results.samples)]

# convert to array
samplesArray = vcat((hcat(i...) for i in samples)...)

See the example below how the results can be visualized.

Function interface

When not parameter transformations are required, the function interface can be a bit simpler to work with. Here we sample from the 'banana-shaped' Rosenbruck function:

using BarkerMCMC

# --- Define target distribution and it's gradient
#     (or use automatic differentation)

function log_p_rosebruck_2d(x; k=1/200)
    -k*(100*(x[2] - x[1]^2)^2 + (1 - x[1])^2)
end

function ∇log_p_rosebruck_2d(x; k=1/200)
    [-2*k*(200*x[1]^3 - 200*x[1]*x[2] + x[1] -1),   # d/dx[1]
     -200*k*(x[2] - x[1]^2)]                        # d/dx[2]
end

# --- Generate samples

res = barker_mcmc(log_p_rosebruck_2d,
                  ∇log_p_rosebruck_2d,
                  [5.0, 5.0];
                  n_iter = 1_000,
                  target_acceptance_rate=0.4)

res.samples
res.log_p

# --- Visualize results

# acceptance rate
length(unique(res.samples[:,1])) / size(res.samples, 1)

# You may want to use `MCMCChains.jl` for plots and diagonstics
# (must be installed separately)

using MCMCChains
using StatsPlots

chain = Chains(res.samples, [:x1, :x2])
chains[200:10:end]                 # remove burn-in and apply thinning

plot(chains)
meanplot(chain)
histogram(chain)
autocorplot(chain)
corner(chain)

API

BarkerMCMC.barker_mcmcFunction

Adaptive MCMC sampler that makes use of gradient information. Based on Livingstone et al. (2020) The adaptation is based on Andrieu and Thoms (2008), Algorithm 4 in Section 5.

barker_mcmc(lp,
            inits::AbstractVector;
            n_iter = 100::Int,
            σ = 2.4/(length(inits)^(1/6)),
            target_acceptance_rate = 0.4,
            κ::Float64 = 0.6,
            n_iter_adaptation = Inf,
            preconditioning::Function = BarkerMCMC.precond_eigen)

or

barker_mcmc(log_p::Function, ∇log_p::Function,
            inits::AbstractVector;
            n_iter = 100::Int,
            σ = 2.4/(length(inits)^(1/6)),
            target_acceptance_rate = 0.4,
            κ::Float64 = 0.6,
            n_iter_adaptation = Inf,
            preconditioning::Function = BarkerMCMC.precond_eigen)

Arguments

  • lp: log density object that supports the API of the LogDensityProblems.jl package
  • log_p::Function: function returning the log of the (non-normalized) density
  • ∇log_p::Function: function returning the gradient of log of the (non-normalized) density
  • inits::Vector: initial starting values
  • n_iter = 100: number of iterations
  • σ = 2.4/(length(inits)^(1/6)): global scale of proposal distribution
  • target_acceptance_rate = 0.4: desired acceptance rate
  • κ = 0.6: controls adaptation speed, κ ∈ (0.5, 1). Larger values lead to slower adaptation, see Section 6.1 in Livingstone et al. (2020).
  • n_iter_adaptation = Inf: number of iterations with adaptation
  • preconditioning::Function = BarkerMCMC.precond_eigen: Either BarkerMCMC.precond_eigen or BarkerMCMC.precond_cholesky. Calculating the preconditioning matrix with a cholesky decomposition is slighly cheaper, however, the eigen value decomposition allows for a proper rotation of the proposal distribution.

Return Value

A named tuple with fields:

  • samples: array containing the samples
  • log_p: vector containing the value of log_p for each sample.

References

Andrieu, C., Thoms, J., 2008. A tutorial on adaptive MCMC. Statistics and computing 18, 343–373.

Livingstone, S., Zanella, G., 2021. The Barker proposal: Combining robustness and efficiency in gradient-based MCMC. Journal of the Royal Statistical Society: Series B (Statistical Methodology). https://doi.org/10.1111/rssb.12482

BarkerMCMC.precond_choleskyFunction

Given a covariance matrix Σ, computes the preconditioning matrix M based on cholesky decomposition.

For M holds that cov(M * z) == Σ, where z a uncorrelated vector of random variables with zero mean.

BarkerMCMC.precond_eigenFunction

Given a covariance matrix Σ, computes the preconditioning matrix M based on eigen value decomposition.

For M holds that cov(M * z) == Σ, where z a uncorrelated vector of random variables with zero mean.

Hamiltonian Monte Carlo (gradient based)

Adaptive MCMC (without gradient)

Literature

Andrieu, C., Thoms, J., 2008. A tutorial on adaptive MCMC. Statistics and computing 18, 343–373.

Livingstone, S., Zanella, G., 2021. The Barker proposal: Combining robustness and efficiency in gradient-based MCMC. Journal of the Royal Statistical Society: Series B (Statistical Methodology). https://doi.org/10.1111/rssb.12482