MCMCDiagnosticTools
Effective sample size and $\widehat{R}$
The effective sample size (ESS) and $\widehat{R}$ can be estimated with ess_rhat
.
MCMCDiagnosticTools.ess_rhat
— Functioness_rhat(
[estimator,]
samples::AbstractArray{<:Union{Missing,Real},3};
method=ESSMethod(),
split_chains::Int=2,
maxlag::Int=250,
)
Estimate the effective sample size and $\widehat{R}$ of the samples
of shape (draws, chains, parameters)
with the method
.
maxlag
indicates the maximum lag for which autocovariance is computed.
By default, the computed ESS and $\widehat{R}$ values correspond to the estimator mean
. Other estimators can be specified by passing a function estimator
(see below).
split_chains
indicates the number of chains each chain is split into. When split_chains > 1
, then the diagnostics check for within-chain convergence. When d = mod(draws, split_chains) > 0
, i.e. the chains cannot be evenly split, then 1 draw is discarded after each of the first d
splits within each chain.
For a given estimand, it is recommended that the ESS is at least 100 * chains
and that $\widehat{R} < 1.01$.[VehtariGelman2021]
See also: ESSMethod
, FFTESSMethod
, BDAESSMethod
, ess_rhat_bulk
, ess_tail
, rhat_tail
Estimators
The ESS and $\widehat{R}$ values can be computed for the following estimators:
Statistics.mean
Statistics.median
Statistics.std
StatsBase.mad
Base.Fix2(Statistics.quantile, p::Real)
MCMCDiagnosticTools.ess_rhat_bulk
— Functioness_rhat_bulk(samples::AbstractArray{<:Union{Missing,Real},3}; kwargs...)
Estimate the bulk-effective sample size and bulk-$\widehat{R}$ values for the samples
of shape (draws, chains, parameters)
.
For a description of kwargs
, see ess_rhat
.
The bulk-ESS and bulk-$\widehat{R}$ are variants of ESS and $\widehat{R}$ that diagnose poor convergence in the bulk of the distribution due to trends or different locations of the chains. While it is conceptually related to ess_rhat
for Statistics.mean
, it is well-defined even if the chains do not have finite variance.[VehtariGelman2021]
Bulk-ESS and bulk-$\widehat{R}$ are computed by rank-normalizing the samples and then computing ess_rhat
. For each parameter, rank-normalization proceeds by first ranking the inputs using "tied ranking" and then transforming the ranks to normal quantiles so that the result is standard normally distributed. The transform is monotonic.
MCMCDiagnosticTools.ess_tail
— Functioness_tail(samples::AbstractArray{<:Union{Missing,Real},3}; tail_prob=1//10, kwargs...)
Estimate the tail-effective sample size and for the samples
of shape (draws, chains, parameters)
.
For a description of kwargs
, see ess_rhat
.
The tail-ESS diagnoses poor convergence in the tails of the distribution. Specifically, it is the minimum of the ESS of the estimate of the symmetric quantiles where tail_prob
is the probability in the tails. For example, with the default tail_prob=1//10
, the tail-ESS is the minimum of the ESS of the 0.5 and 0.95 sample quantiles.[VehtariGelman2021]
See also: ess_rhat_bulk
, rhat_tail
MCMCDiagnosticTools.rhat_tail
— Functionrhat_tail(samples::AbstractArray{Union{Real,Missing},3}; kwargs...)
Estimate the tail-$\widehat{R}$ diagnostic for the samples
of shape (draws, chains, parameters)
.
For a description of kwargs
, see ess_rhat
.
The tail-$\widehat{R}$ diagnostic is a variant of $\widehat{R}$ that diagnoses poor convergence in the tails of the distribution. In particular, it can detect chains that have similar locations but different scales.[VehtariGelman2021]
For each parameter matrix of draws x
with size (draws, chains)
, it is calculated by computing bulk-$\widehat{R}$ on the absolute deviation of the draws from the median: abs.(x .- median(x))
.
See also: ess_tail
, ess_rhat_bulk
The following methods are supported:
MCMCDiagnosticTools.ESSMethod
— TypeESSMethod <: AbstractESSMethod
The ESSMethod
uses a standard algorithm for estimating the effective sample size of MCMC chains.
It is is based on the discussion by [VehtariGelman2021] and uses the biased estimator of the autocovariance, as discussed by [Geyer1992].
MCMCDiagnosticTools.FFTESSMethod
— TypeFFTESSMethod <: AbstractESSMethod
The FFTESSMethod
uses a standard algorithm for estimating the effective sample size of MCMC chains.
The algorithm is the same as the one of ESSMethod
but this method uses fast Fourier transforms (FFTs) for estimating the autocorrelation.
To be able to use this method, you have to load a package that implements the AbstractFFTs.jl interface such as FFTW.jl or FastTransforms.jl.
MCMCDiagnosticTools.BDAESSMethod
— TypeBDAESSMethod <: AbstractESSMethod
The BDAESSMethod
uses a standard algorithm for estimating the effective sample size of MCMC chains.
It is is based on the discussion by [VehtariGelman2021]. and uses the variogram estimator of the autocorrelation function discussed by [BDA3].
Monte Carlo standard error
MCMCDiagnosticTools.mcse
— Functionmcse(x::AbstractVector{<:Real}; method::Symbol=:imse, kwargs...)
Compute the Monte Carlo standard error (MCSE) of samples x
. The optional argument method
describes how the errors are estimated. Possible options are:
:bm
for batch means [Glynn1991]:imse
initial monotone sequence estimator [Geyer1992]:ipse
initial positive sequence estimator [Geyer1992]
R⋆ diagnostic
MCMCDiagnosticTools.rstar
— Functionrstar(
rng::Random.AbstractRNG=Random.default_rng(),
classifier::MLJModelInterface.Supervised,
samples,
chain_indices::AbstractVector{Int};
subset::Real=0.7,
split_chains::Int=2,
verbosity::Int=0,
)
Compute the $R^*$ convergence statistic of the table samples
with the classifier
.
samples
must be either an AbstractMatrix
, an AbstractVector
, or a table (i.e. implements the Tables.jl interface) whose rows are draws and whose columns are parameters.
chain_indices
indicates the chain ids of each row of samples
.
This method supports ragged chains, i.e. chains of nonequal lengths.
rstar(
rng::Random.AbstractRNG=Random.default_rng(),
classifier::MLJModelInterface.Supervised,
samples::AbstractArray{<:Real,3};
subset::Real=0.7,
split_chains::Int=2,
verbosity::Int=0,
)
Compute the $R^*$ convergence statistic of the samples
with the classifier
.
samples
is an array of draws with the shape (draws, chains, parameters)
.`
This implementation is an adaption of algorithms 1 and 2 described by Lambert and Vehtari.
The classifier
has to be a supervised classifier of the MLJ framework (see the MLJ documentation for a list of supported models). It is trained with a subset
of the samples from each chain. Each chain is split into split_chains
separate chains to additionally check for within-chain convergence. The training of the classifier can be inspected by adjusting the verbosity
level.
If the classifier is deterministic, i.e., if it predicts a class, the value of the $R^*$ statistic is returned (algorithm 1). If the classifier is probabilistic, i.e., if it outputs probabilities of classes, the scaled Poisson-binomial distribution of the $R^*$ statistic is returned (algorithm 2).
The correctness of the statistic depends on the convergence of the classifier
used internally in the statistic.
Examples
julia> using MLJBase, MLJXGBoostInterface, Statistics
julia> samples = fill(4.0, 100, 3, 2);
One can compute the distribution of the $R^*$ statistic (algorithm 2) with the probabilistic classifier.
julia> distribution = rstar(XGBoostClassifier(), samples);
julia> isapprox(mean(distribution), 1; atol=0.1)
true
For deterministic classifiers, a single $R^*$ statistic (algorithm 1) is returned. Deterministic classifiers can also be derived from probabilistic classifiers by e.g. predicting the mode. In MLJ this corresponds to a pipeline of models.
julia> xgboost_deterministic = Pipeline(XGBoostClassifier(); operation=predict_mode);
julia> value = rstar(xgboost_deterministic, samples);
julia> isapprox(value, 1; atol=0.2)
true
References
Lambert, B., & Vehtari, A. (2020). $R^*$: A robust MCMC convergence diagnostic with uncertainty using decision tree classifiers.
Bayesian fraction of missing information
MCMCDiagnosticTools.bfmi
— Functionbfmi(energy::AbstractVector{<:Real}) -> Real
bfmi(energy::AbstractMatrix{<:Real}; dims::Int=1) -> AbstractVector{<:Real}
Calculate the estimated Bayesian fraction of missing information (BFMI).
When sampling with Hamiltonian Monte Carlo (HMC), BFMI quantifies how well momentum resampling matches the marginal energy distribution.
The current advice is that values smaller than 0.3 indicate poor sampling. However, this threshold is provisional and may change. A BFMI value below the threshold often indicates poor adaptation of sampling parameters or that the target distribution has heavy tails that were not well explored by the Markov chain.
For more information, see Section 6.1 of [Betancourt2018] or [Betancourt2016] for a complete account.
energy
is either a vector of Hamiltonian energies of draws or a matrix of energies of draws for multiple chains. dims
indicates the dimension in energy
that contains the draws. The default dims=1
assumes energy
has the shape draws
or (draws, chains)
. If a different shape is provided, dims
must be set accordingly.
If energy
is a vector, a single BFMI value is returned. Otherwise, a vector of BFMI values for each chain is returned.
Other diagnostics
MCMCDiagnosticTools.discretediag
— Functiondiscretediag(samples::AbstractArray{<:Real,3}; frac=0.3, method=:weiss, nsim=1_000)
Compute discrete diagnostic on samples
with shape (draws, chains, parameters)
.
method
can be one of :weiss
, :hangartner
, :DARBOOT
, :MCBOOT
, :billinsgley
, and :billingsleyBOOT
.
References
Benjamin E. Deonovic, & Brian J. Smith. (2017). Convergence diagnostics for MCMC draws of a categorical variable.
MCMCDiagnosticTools.gelmandiag
— Functiongelmandiag(samples::AbstractArray{<:Real,3}; alpha::Real=0.95)
Compute the Gelman, Rubin and Brooks diagnostics [Gelman1992] [Brooks1998] on samples
with shape (draws, chains, parameters)
. Values of the diagnostic’s potential scale reduction factor (PSRF) that are close to one suggest convergence. As a rule-of-thumb, convergence is rejected if the 97.5 percentile of a PSRF is greater than 1.2.
MCMCDiagnosticTools.gelmandiag_multivariate
— Functiongelmandiag_multivariate(samples::AbstractArray{<:Real,3}; alpha::Real=0.05)
Compute the multivariate Gelman, Rubin and Brooks diagnostics on samples
with shape (draws, chains, parameters)
.
MCMCDiagnosticTools.gewekediag
— Functiongewekediag(x::AbstractVector{<:Real}; first::Real=0.1, last::Real=0.5, kwargs...)
Compute the Geweke diagnostic [Geweke1991] from the first
and last
proportion of samples x
.
The diagnostic is designed to asses convergence of posterior means estimated with autocorrelated samples. It computes a normal-based test statistic comparing the sample means in two windows containing proportions of the first and last iterations. Users should ensure that there is sufficient separation between the two windows to assume that their samples are independent. A non-significant test p-value indicates convergence. Significant p-values indicate non-convergence and the possible need to discard initial samples as a burn-in sequence or to simulate additional samples.
MCMCDiagnosticTools.heideldiag
— Functionheideldiag(
x::AbstractVector{<:Real}; alpha::Real=0.05, eps::Real=0.1, start::Int=1, kwargs...
)
Compute the Heidelberger and Welch diagnostic [Heidelberger1983]. This diagnostic tests for non-convergence (non-stationarity) and whether ratios of estimation interval halfwidths to means are within a target ratio. Stationarity is rejected (0) for significant test p-values. Halfwidth tests are rejected (0) if observed ratios are greater than the target, as is the case for s2
and beta[1]
.
MCMCDiagnosticTools.rafterydiag
— Functionrafterydiag(
x::AbstractVector{<:Real}; q=0.025, r=0.005, s=0.95, eps=0.001, range=1:length(x)
)
Compute the Raftery and Lewis diagnostic [Raftery1992]. This diagnostic is used to determine the number of iterations required to estimate a specified quantile q
within a desired degree of accuracy. The diagnostic is designed to determine the number of autocorrelated samples required to estimate a specified quantile $\theta_q$, such that $\Pr(\theta \le \theta_q) = q$, within a desired degree of accuracy. In particular, if $\hat{\theta}_q$ is the estimand and $\Pr(\theta \le \hat{\theta}_q) = \hat{P}_q$ the estimated cumulative probability, then accuracy is specified in terms of r
and s
, where $\Pr(q - r < \hat{P}_q < q + r) = s$. Thinning may be employed in the calculation of the diagnostic to satisfy its underlying assumptions. However, users may not want to apply the same (or any) thinning when estimating posterior summary statistics because doing so results in a loss of information. Accordingly, sample sizes estimated by the diagnostic tend to be conservative (too large).
Furthermore, the argument r
specifies the margin of error for estimated cumulative probabilities and s
the probability for the margin of error. eps
specifies the tolerance within which the probabilities of transitioning from initial to retained iterations are within the equilibrium probabilities for the chain. This argument determines the number of samples to discard as a burn-in sequence and is typically left at its default value.
- VehtariGelman2021Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P. C. (2021). Rank-normalization, folding, and localization: An improved $\widehat {R}$ for assessing convergence of MCMC. Bayesian Analysis. doi: 10.1214/20-BA1221 arXiv: 1903.08008
- VehtariGelman2021Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P. C. (2021). Rank-normalization, folding, and localization: An improved $\widehat {R}$ for assessing convergence of MCMC. Bayesian Analysis. doi: 10.1214/20-BA1221 arXiv: 1903.08008
- VehtariGelman2021Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P. C. (2021). Rank-normalization, folding, and localization: An improved $\widehat {R}$ for assessing convergence of MCMC. Bayesian Analysis. doi: 10.1214/20-BA1221 arXiv: 1903.08008
- VehtariGelman2021Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P. C. (2021). Rank-normalization, folding, and localization: An improved $\widehat {R}$ for assessing convergence of MCMC. Bayesian Analysis. doi: 10.1214/20-BA1221 arXiv: 1903.08008
- VehtariGelman2021Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P. C. (2021). Rank-normalization, folding, and localization: An improved $\widehat {R}$ for assessing convergence of MCMC. Bayesian Analysis. doi: 10.1214/20-BA1221 arXiv: 1903.08008
- Geyer1992Geyer, C. J. (1992). Practical Markov Chain Monte Carlo. Statistical Science, 473-483.
- VehtariGelman2021Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P. C. (2021). Rank-normalization, folding, and localization: An improved $\widehat {R}$ for assessing convergence of MCMC. Bayesian Analysis. doi: 10.1214/20-BA1221 arXiv: 1903.08008
- BDA3Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis. CRC press.
- Glynn1991Glynn, P. W., & Whitt, W. (1991). Estimating the asymptotic variance with batch means. Operations Research Letters, 10(8), 431-435.
- Geyer1992Geyer, C. J. (1992). Practical Markov Chain Monte Carlo. Statistical Science, 473-483.
- Betancourt2018Betancourt M. (2018). A Conceptual Introduction to Hamiltonian Monte Carlo. arXiv:1701.02434v2 [stat.ME]
- Betancourt2016Betancourt M. (2016). Diagnosing Suboptimal Cotangent Disintegrations in Hamiltonian Monte Carlo. arXiv:1604.00695v1 [stat.ME]
- Gelman1992Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical science, 7(4), 457-472.
- Brooks1998Brooks, S. P., & Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. Journal of computational and graphical statistics, 7(4), 434-455.
- Geweke1991Geweke, J. F. (1991). Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments (No. 148). Federal Reserve Bank of Minneapolis.
- Heidelberger1983Heidelberger, P., & Welch, P. D. (1983). Simulation run length control in the presence of an initial transient. Operations Research, 31(6), 1109-1144.
- Raftery1992A L Raftery and S Lewis. Bayesian Statistics, chapter How Many Iterations in the Gibbs Sampler? Volume 4. Oxford University Press, New York, 1992.