MCMC Estimation

Sampler

BayesFlux.MCMCState — Type

abstract type MCMCState end

Every MCMC method must be implemented via a MCMCState which keeps track of all important information.

Mandatory Fields

samples::Matrix of dimension numtotalparameter×num_samples
nsampled the number of thus far sampled samples

Mandatory Functions

update!(sampler, θ, bnn, ∇θ) where θ is the current parameter vector and ∇θ(θ) is a function providing gradients. The function must return θ and num_samples so far sampled.
initialise!(sampler, θ, numsamples; continue_sampling) which initialises the sampler. If continue_sampling is true, then the final goal is to obtain numsamples samples and thus only the remaining ones still need to be sampled.
calculate_epochs(sampler, numbatches, numsamples; continue_sampling) which calculates the number of epochs that must be run through in order to obtain numsamples samples if numbatches batches are used. The number of epochs must be returned. If continue_sampling is true, then the goal is to obtain in total numsamples samples and thus we only need the number of epochs that still need to be run to obtain this total and NOT the number of epochs to sample numsamples new samples.

BayesFlux.mcmc — Function

mcmc(args...; kwargs...)

Sample from a BNN using MCMC

Arguments

bnn: a Bayesian Neural Network
batchsize: batchsize
numsamples: number of samples to take
sampler: sampler to use

Keyword Arguments

shuffle::Bool=true: should data be shuffled after each epoch such that batches are different in each epoch?
partial::Bool=true: are partial batches allowed? If true, some batches might be smaller than batchsize
showprogress::Bool=true: should a progress bar be shown?
continue_sampling::Bool=false: If true and numsamples is larger than sampler.nsampled then additional samples will be taken
θstart::AbstractVector{T}=vcat(bnn.init()...): starting parameter vector

BayesFlux.SGLD — Type

Stochastic Gradient Langevin Dynamics as proposed in Welling, M., & Teh, Y. W. (n.d.). Bayesian Learning via Stochastic Gradient Langevin Dynamics. 8.

Fields

θ::AbstractVector: Current sample
samples::Matrix: Matrix of samples. Not all columns will be actual samples if sampling was stopped early. See nsampled for the actual number of samples taken.
nsampled::Int: Number of samples taken
min_stepsize::T: Stop decreasing the stepsize when it is below this value.
didinform::Bool: Flag keeping track of whether we informed user that min_stepsize was reached.
stepsize_a::T: See stepsize
stepsize_b::T: See stepsize
stepsize_γ::T: See stepsize
maxnorm::T: Maximimum gradient norm. Gradients are being clipped if norm exceeds this value

BayesFlux.SGNHTS — Type

Stochastic Gradient Nose-Hoover Thermostat as proposed in

Proposed in Leimkuhler, B., & Shang, X. (2016). Adaptive thermostats for noisy gradient systems. SIAM Journal on Scientific Computing, 38(2), A712-A736.

This is similar to SGNHT as proposed in Ding, N., Fang, Y., Babbush, R., Chen, C., Skeel, R. D., & Neven, H. (2014). Bayesian sampling using stochastic gradient thermostats. Advances in neural information processing systems, 27.

Fields

samples::Matrix: Containing the samples
nsampled::Int: Number of samples taken so far. Can be smaller than size(samples, 2) if sampling was interrupted.
p::AbstractVector: Momentum
xi::Number: Thermostat
l::Number: Stepsize; This often is in the 0.001-0.1 range.
σA::Number: Diffusion factor; If the stepsize is small, this should be larger than 1.
μ::Number: Free parameter in thermostat. Defaults to 1.
t::Int: Current step count
kinetic::Vector: Keeps track of the kinetic energy. Goal of SGNHT is to have the average close to one

BayesFlux.SGNHT — Type

Stochastic Gradient Nose-Hoover Thermostat as proposed in

Ding, N., Fang, Y., Babbush, R., Chen, C., Skeel, R. D., & Neven, H. (2014). Bayesian sampling using stochastic gradient thermostats. Advances in neural information processing systems, 27.

Fields

samples::Matrix: Containing the samples
nsampled::Int: Number of samples taken so far. Can be smaller than size(samples, 2) if sampling was interrupted.
p::AbstractVector: Momentum
xi::Number: Thermostat
l::Number: Stepsize
A::Number: Diffusion factor
t::Int: Current step count
kinetic::Vector: Keeps track of the kinetic energy. Goal of SGNHT is to have the average close to one

BayesFlux.GGMC — Type

Gradient Guided Monte Carlo

Proposed in Garriga-Alonso, A., & Fortuin, V. (2021). Exact langevin dynamics with stochastic gradients. arXiv preprint arXiv:2102.01691.

Fields

samples::Matrix: Matrix containing the samples. If sampling stopped early, then not all columns will actually correspond to samples. See nsampled to check how many samples were actually taken
nsampled::Int: Number of samples taken.
t::Int: Total number of steps taken.
accepted::Vector{Bool}: If true, sample was accepted; If false, proposed sample was rejected and previous sample was taken.
β::T: See paper.
l::T: Step-length; See paper.
sadapter::StepsizeAdapter: A StepsizeAdapter. Default is DualAveragingStepSize
M::AbstractMatrix: Mass Matrix
Mhalf::AbstractMatrix: Lower triangual cholesky decomposition of M
Minv::AbstractMatrix: Inverse mass matrix.
madapter::MassAdapter: A MassAdapter
momentum::AbstractVector: Last momentum vector
lMH::T: log of Metropolis-Hastings ratio.
steps::Int: Number of steps to take before calculating MH ratio.
current_step::Int: Current step in the recurrent sequence 1, ..., steps.
maxnorm::T: Maximimum gradient norm. Gradients are being clipped if norm exceeds this value

BayesFlux.HMC — Type

Standard Hamiltonian Monte Carlo (Hybrid Monte Carlo).

Allows for the use of stochastic gradients, but the validity of doing so is not clear.

This is motivated by parts of the discussion in Neal, R. M. (1996). Bayesian Learning for Neural Networks (Vol. 118). Springer New York. https://doi.org/10.1007/978-1-4612-0745-0

Code was partially adapted from https://colindcarroll.com/2019/04/11/hamiltonian-monte-carlo-from-scratch/

Fields

samples::Matrix: Samples taken
nsampled::Int: Number of samples taken. Might be smaller than size(samples) if sampling was interrupted.
θold::AbstractVector: Old sample. Kept for rejection step.
momentum::AbstractVector: Momentum variables
momentumold::AbstractVector: Old momentum variables. Kept for rejection step.
t::Int: Current step.
path_len::Int: Number of leapfrog steps.
current_step::Int: Current leapfrog step.
accepted::Vector{Bool}: Whether a draw in samples was a accepted draw or rejected (in which case it is the same as the previous one.)
sadapter::StepsizeAdapter: Stepsize adapter giving the stepsize in each iteration.
l: Stepsize.
madapter::MassAdapter: Mass matrix adapter giving the inverse mass matrix in each iteration.
Minv::AbstractMatrix: Inverse mass matrix
maxnorm::T: Maximimum gradient norm. Gradients are being clipped if norm exceeds this value

BayesFlux.AdaptiveMH — Type

Adaptive Metropolis Hastings as introduced in

Haario, H., Saksman, E., & Tamminen, J. (2001). An adaptive Metropolis algorithm. Bernoulli, 223-242.

Fields

samples::Matix: Matrix holding the samples. If sampling was stopped early, not all columns will represent samples. To figure out how many columns represent samples, check out nsampled.
nsampled::Int: Number of samples obtained.
C0::Matrix: Initial covariance matrix.
Ct::Matrix: Covariance matrix in iteration t
t::Int: Current time period
t0::Int: When to start adaptig the covariance matrix? Covariance is adapted in a rolling window form.
sd::T: See the paper.
ϵ::T: Will be added to diagonal to prevent numerical non-pod-def problems. If you run into numerical problems, try increasing this values.
accepted::Vector{Bool}: For each sample, indicating whether the sample was accepted (true) or the previous samples was chosen (false)

Notes

Adaptive MH might not be suited if it is very costly to calculate the likelihood as this needs to be done for each sample on the full dataset. Plans exist to make this faster.
Works best when started at a MAP estimate.

Mass Adaptation

BayesFlux.MassAdapter — Type

Adapt the mass matrix in MCMC and especially dynamic MCMC methods such as HMC, GGMC, SGLD, SGNHT, ...

Mandatory Fields

Minv::AbstractMatrix: The inverse mass matrix used in HMC, GGMC, ...

Mandatory Functions

(madapter::MassAdapter)(s::MCMCState, θ::AbstractVector, bnn, ∇θ): Every mass adapter must be callable and have the sampler state, the current sample, the BNN and a gradient function as arguments. It must return the new Minv Matrix.

BayesFlux.DiagCovMassAdapter — Type

Use the variances as the diagonal of the inverse mass matrix as used in HMC, GGMC, ...;

Fields

Minv: Inverse mass matrix as used in HMC, SGLD, GGMC, ...
adapt_steps: Number of adaptation steps.
windowlength: Lookback length for calculation of covariance.
t: Current step.
kappa: How much to shrink towards the identity.
epsilon: Small value to add to diagonal to avoid numerical instability.

BayesFlux.FullCovMassAdapter — Type

Use the full covariance matrix of a moving average of samples as the mass matrix. This is similar to what is already done in Adaptive MH.

Fields

Minv: Inverse mass matrix as used in HMC, SGLD, GGMC, ...
adapt_steps: Number of adaptation steps.
windowlength: Lookback length for calculation of covariance.
t: Current step.
kappa: How much to shrink towards the identity.
epsilon: Small value to add to diagonal to avoid numerical instability.

BayesFlux.FixedMassAdapter — Type

Use a fixed inverse mass matrix.

BayesFlux.RMSPropMassAdapter — Type

Use RMSProp as a precondition/mass matrix adapter. This was proposed in

Li, C., Chen, C., Carlson, D., & Carin, L. (2016, February). Preconditioned stochastic gradient Langevin dynamics for deep neural networks. In Thirtieth AAAI Conference on Artificial Intelligence for the use in SGLD and related methods.

Stepsize Adaptation

BayesFlux.StepsizeAdapter — Type

Adapt the stepsize of MCMC algorithms.

Implentation Details

Mandatory Fields

l::Number The stepsize. Will be used by the sampler.

Mandatory Functions

(sadapter::StepsizeAdapter)(s::MCMCState, mh_probability::Number) Every stepsize adapter must be callable with arguments, being the sampler itself and the Metropolis-Hastings acceptance probability. The method must return the new stepsize.

BayesFlux.ConstantStepsize — Type

Use a contant stepsize.

BayesFlux.DualAveragingStepSize — Type

Use the Dual Average method to tune the stepsize.

The use of the Dual Average method was proposed in:

Hoffman, M. D., & Gelman, A. (2014). The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15, 31.