# MCMC Estimation

## Sampler

`BayesFlux.MCMCState`

— Type`abstract type MCMCState end`

Every MCMC method must be implemented via a MCMCState which keeps track of all important information.

**Mandatory Fields**

`samples::Matrix`

of dimension num*total*parameter×num_samples`nsampled`

the number of thus far sampled samples

**Mandatory Functions**

`update!(sampler, θ, bnn, ∇θ)`

where θ is the current parameter vector and ∇θ(θ) is a function providing gradients. The function must return θ and num_samples so far sampled.`initialise!(sampler, θ, numsamples; continue_sampling)`

which initialises the sampler. If continue_sampling is true, then the final goal is to obtain numsamples samples and thus only the remaining ones still need to be sampled.`calculate_epochs(sampler, numbatches, numsamples; continue_sampling)`

which calculates the number of epochs that must be run through in order to obtain`numsamples`

samples if`numbatches`

batches are used. The number of epochs must be returned. If`continue_sampling`

is true, then the goal is to obtain in total`numsamples`

samples and thus we only need the number of epochs that still need to be run to obtain this total and NOT the number of epochs to sample`numsamples`

new samples.

`BayesFlux.mcmc`

— Function`mcmc(args...; kwargs...)`

Sample from a BNN using MCMC

**Arguments**

`bnn`

: a Bayesian Neural Network`batchsize`

: batchsize`numsamples`

: number of samples to take`sampler`

: sampler to use

**Keyword Arguments**

`shuffle::Bool=true`

: should data be shuffled after each epoch such that batches are different in each epoch?`partial::Bool=true`

: are partial batches allowed? If true, some batches might be smaller than`batchsize`

`showprogress::Bool=true`

: should a progress bar be shown?`continue_sampling::Bool=false`

: If true and`numsamples`

is larger than`sampler.nsampled`

then additional samples will be taken`θstart::AbstractVector{T}=vcat(bnn.init()...)`

: starting parameter vector

`BayesFlux.SGLD`

— TypeStochastic Gradient Langevin Dynamics as proposed in Welling, M., & Teh, Y. W. (n.d.). Bayesian Learning via Stochastic Gradient Langevin Dynamics. 8.

**Fields**

`θ::AbstractVector`

: Current sample`samples::Matrix`

: Matrix of samples. Not all columns will be actual samples if sampling was stopped early. See`nsampled`

for the actual number of samples taken.`nsampled::Int`

: Number of samples taken`min_stepsize::T`

: Stop decreasing the stepsize when it is below this value.`didinform::Bool`

: Flag keeping track of whether we informed user that`min_stepsize`

was reached.`stepsize_a::T`

: See`stepsize`

`stepsize_b::T`

: See`stepsize`

`stepsize_γ::T`

: See`stepsize`

`maxnorm::T`

: Maximimum gradient norm. Gradients are being clipped if norm exceeds this value

`BayesFlux.SGNHTS`

— TypeStochastic Gradient Nose-Hoover Thermostat as proposed in

Proposed in Leimkuhler, B., & Shang, X. (2016). Adaptive thermostats for noisy gradient systems. SIAM Journal on Scientific Computing, 38(2), A712-A736.

This is similar to SGNHT as proposed in Ding, N., Fang, Y., Babbush, R., Chen, C., Skeel, R. D., & Neven, H. (2014). Bayesian sampling using stochastic gradient thermostats. Advances in neural information processing systems, 27.

**Fields**

`samples::Matrix`

: Containing the samples`nsampled::Int`

: Number of samples taken so far. Can be smaller than`size(samples, 2)`

if sampling was interrupted.`p::AbstractVector`

: Momentum`xi::Number`

: Thermostat`l::Number`

: Stepsize; This often is in the 0.001-0.1 range.`σA::Number`

: Diffusion factor; If the stepsize is small, this should be larger than 1.`μ::Number`

: Free parameter in thermostat. Defaults to 1.`t::Int`

: Current step count`kinetic::Vector`

: Keeps track of the kinetic energy. Goal of SGNHT is to have the average close to one

`BayesFlux.SGNHT`

— TypeStochastic Gradient Nose-Hoover Thermostat as proposed in

Ding, N., Fang, Y., Babbush, R., Chen, C., Skeel, R. D., & Neven, H. (2014). Bayesian sampling using stochastic gradient thermostats. Advances in neural information processing systems, 27.

**Fields**

`samples::Matrix`

: Containing the samples`nsampled::Int`

: Number of samples taken so far. Can be smaller than`size(samples, 2)`

if sampling was interrupted.`p::AbstractVector`

: Momentum`xi::Number`

: Thermostat`l::Number`

: Stepsize`A::Number`

: Diffusion factor`t::Int`

: Current step count`kinetic::Vector`

: Keeps track of the kinetic energy. Goal of SGNHT is to have the average close to one

`BayesFlux.GGMC`

— TypeGradient Guided Monte Carlo

Proposed in Garriga-Alonso, A., & Fortuin, V. (2021). Exact langevin dynamics with stochastic gradients. arXiv preprint arXiv:2102.01691.

**Fields**

`samples::Matrix`

: Matrix containing the samples. If sampling stopped early, then not all columns will actually correspond to samples. See`nsampled`

to check how many samples were actually taken`nsampled::Int`

: Number of samples taken.`t::Int`

: Total number of steps taken.`accepted::Vector{Bool}`

: If true, sample was accepted; If false, proposed sample was rejected and previous sample was taken.`β::T`

: See paper.`l::T`

: Step-length; See paper.`sadapter::StepsizeAdapter`

: A StepsizeAdapter. Default is`DualAveragingStepSize`

`M::AbstractMatrix`

: Mass Matrix`Mhalf::AbstractMatrix`

: Lower triangual cholesky decomposition of`M`

`Minv::AbstractMatrix`

: Inverse mass matrix.`madapter::MassAdapter`

: A MassAdapter`momentum::AbstractVector`

: Last momentum vector`lMH::T`

: log of Metropolis-Hastings ratio.`steps::Int`

: Number of steps to take before calculating MH ratio.`current_step::Int`

: Current step in the recurrent sequence 1, ...,`steps`

.`maxnorm::T`

: Maximimum gradient norm. Gradients are being clipped if norm exceeds this value

`BayesFlux.HMC`

— TypeStandard Hamiltonian Monte Carlo (Hybrid Monte Carlo).

Allows for the use of stochastic gradients, but the validity of doing so is not clear.

This is motivated by parts of the discussion in Neal, R. M. (1996). Bayesian Learning for Neural Networks (Vol. 118). Springer New York. https://doi.org/10.1007/978-1-4612-0745-0

Code was partially adapted from https://colindcarroll.com/2019/04/11/hamiltonian-monte-carlo-from-scratch/

**Fields**

`samples::Matrix`

: Samples taken`nsampled::Int`

: Number of samples taken. Might be smaller than`size(samples)`

if sampling was interrupted.`θold::AbstractVector`

: Old sample. Kept for rejection step.`momentum::AbstractVector`

: Momentum variables`momentumold::AbstractVector`

: Old momentum variables. Kept for rejection step.`t::Int`

: Current step.`path_len::Int`

: Number of leapfrog steps.`current_step::Int`

: Current leapfrog step.`accepted::Vector{Bool}`

: Whether a draw in`samples`

was a accepted draw or rejected (in which case it is the same as the previous one.)`sadapter::StepsizeAdapter`

: Stepsize adapter giving the stepsize in each iteration.`l`

: Stepsize.`madapter::MassAdapter`

: Mass matrix adapter giving the inverse mass matrix in each iteration.`Minv::AbstractMatrix`

: Inverse mass matrix`maxnorm::T`

: Maximimum gradient norm. Gradients are being clipped if norm exceeds this value

`BayesFlux.AdaptiveMH`

— TypeAdaptive Metropolis Hastings as introduced in

Haario, H., Saksman, E., & Tamminen, J. (2001). An adaptive Metropolis algorithm. Bernoulli, 223-242.

**Fields**

`samples::Matix`

: Matrix holding the samples. If sampling was stopped early, not all columns will represent samples. To figure out how many columns represent samples, check out`nsampled`

.`nsampled::Int`

: Number of samples obtained.`C0::Matrix`

: Initial covariance matrix.`Ct::Matrix`

: Covariance matrix in iteration t`t::Int`

: Current time period`t0::Int`

: When to start adaptig the covariance matrix? Covariance is adapted in a rolling window form.`sd::T`

: See the paper.`ϵ::T`

: Will be added to diagonal to prevent numerical non-pod-def problems. If you run into numerical problems, try increasing this values.`accepted::Vector{Bool}`

: For each sample, indicating whether the sample was accepted (true) or the previous samples was chosen (false)

**Notes**

- Adaptive MH might not be suited if it is very costly to calculate the likelihood as this needs to be done for each sample on the full dataset. Plans exist to make this faster.
- Works best when started at a MAP estimate.

## Mass Adaptation

`BayesFlux.MassAdapter`

— TypeAdapt the mass matrix in MCMC and especially dynamic MCMC methods such as HMC, GGMC, SGLD, SGNHT, ...

**Mandatory Fields**

`Minv::AbstractMatrix`

: The inverse mass matrix used in HMC, GGMC, ...

**Mandatory Functions**

`(madapter::MassAdapter)(s::MCMCState, θ::AbstractVector, bnn, ∇θ)`

: Every mass adapter must be callable and have the sampler state, the current sample, the BNN and a gradient function as arguments. It must return the new`Minv`

Matrix.

`BayesFlux.DiagCovMassAdapter`

— TypeUse the variances as the diagonal of the inverse mass matrix as used in HMC, GGMC, ...;

**Fields**

`Minv`

: Inverse mass matrix as used in HMC, SGLD, GGMC, ...`adapt_steps`

: Number of adaptation steps.`windowlength`

: Lookback length for calculation of covariance.`t`

: Current step.`kappa`

: How much to shrink towards the identity.`epsilon`

: Small value to add to diagonal to avoid numerical instability.

`BayesFlux.FullCovMassAdapter`

— TypeUse the full covariance matrix of a moving average of samples as the mass matrix. This is similar to what is already done in Adaptive MH.

**Fields**

`Minv`

: Inverse mass matrix as used in HMC, SGLD, GGMC, ...`adapt_steps`

: Number of adaptation steps.`windowlength`

: Lookback length for calculation of covariance.`t`

: Current step.`kappa`

: How much to shrink towards the identity.`epsilon`

: Small value to add to diagonal to avoid numerical instability.

`BayesFlux.FixedMassAdapter`

— TypeUse a fixed inverse mass matrix.

`BayesFlux.RMSPropMassAdapter`

— TypeUse RMSProp as a precondition/mass matrix adapter. This was proposed in

Li, C., Chen, C., Carlson, D., & Carin, L. (2016, February). Preconditioned stochastic gradient Langevin dynamics for deep neural networks. In Thirtieth AAAI Conference on Artificial Intelligence for the use in SGLD and related methods.

## Stepsize Adaptation

`BayesFlux.StepsizeAdapter`

— TypeAdapt the stepsize of MCMC algorithms.

**Implentation Details**

**Mandatory Fields**

`l::Number`

The stepsize. Will be used by the sampler.

**Mandatory Functions**

`(sadapter::StepsizeAdapter)(s::MCMCState, mh_probability::Number)`

Every stepsize adapter must be callable with arguments, being the sampler itself and the Metropolis-Hastings acceptance probability. The method must return the new stepsize.

`BayesFlux.ConstantStepsize`

— TypeUse a contant stepsize.

`BayesFlux.DualAveragingStepSize`

— TypeUse the Dual Average method to tune the stepsize.

The use of the Dual Average method was proposed in:

Hoffman, M. D., & Gelman, A. (2014). The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15, 31.