ParetoSmooth

Documentation for ParetoSmooth.

ParetoSmooth.PsisType
Psis{V<:AbstractVector{F},I<:Integer} where {F<:AbstractFloat}

A struct containing the results of Pareto-smoothed importance sampling.

Fields

  • weights: A vector of smoothed, truncated, and normalized importance sampling weights.
  • pareto_k: Estimates of the shape parameter $k$ of the generalized Pareto distribution.
  • ess: Estimated effective sample size for each LOO evaluation.
  • tail_len: Vector of tail lengths used for smoothing the generalized Pareto distribution.
  • dims: Named tuple of length 2 containing s (posterior sample size) and n (number of

observations).

ParetoSmooth._convert_to_arrayMethod

Convert a matrix+chain_index representation to a 3d array representation to pass it off to the method for arrays.

ParetoSmooth.def_tail_lengthMethod
def_tail_length(log_ratios::AbstractVector, r_eff::AbstractFloat) -> tail_len::Integer

Define the tail length as in Vehtari et al. (2019).

ParetoSmooth.do_psis_i!Method
do_psis_i!(is_ratios::AbstractVector{AbstractFloat}, tail_length::Integer) -> T

Do PSIS on a single vector, smoothing its tail values.

Arguments

  • is_ratios::AbstractVector{AbstractFloat}: A vector of importance sampling ratios,

scaled to have a maximum of 1.

Returns

  • T<:AbstractFloat: ξ, the shape parameter for the GPD; big numbers indicate thick tails.

Extended help

Additional information can be found in the LOO package from R.

ParetoSmooth.gpd_quantileMethod
gpd_quantile(p::T, k::T, sigma::T) where {T<:AbstractFloat} -> T

Compute the p quantile of the Generalized Pareto Distribution (GPD).

Arguments

  • p: A scalar between 0 and 1.
  • ξ: A scalar shape parameter.
  • σ: A scalar scale parameter.

Returns

A quantile of the Generalized Pareto Distribution.

ParetoSmooth.gpdfitMethod
gpdfit(
    sample::AbstractVector{T<:AbstractFloat}; 
    wip::Bool=true, 
    min_grid_pts::Integer=30, 
    sort_sample::Bool=false
) -> (ξ::T, σ::T)

Return a named list of estimates for the parameters ξ (shape) and σ (scale) of the generalized Pareto distribution (GPD), assuming the location parameter is 0.

Arguments

  • sample::AbstractVector: A numeric vector. The sample from which to estimate the parameters.
  • wip::Bool = true: Logical indicating whether to adjust ξ based on a weakly informative Gaussian prior centered on 0.5. Defaults to true.
  • min_grid_pts::Integer = 30: The minimum number of grid points used in the fitting algorithm. The actual number used is min_grid_pts + ⌊sqrt(length(sample))⌋.

Note

Estimation method taken from Zhang, J. and Stephens, M.A. (2009). The parameter ξ is the negative of $k$.

ParetoSmooth.looMethod
function loo(args...; method=PsisLooMethod(), kwargs...) -> PsisLoo

Compute the approximate leave-one-out cross-validation score using the specified method.

Currently, this function only serves to call psis_loo, but this could change in the future. The default methods or return type may change without warning; thus, we recommend using psis_loo instead if reproducibility is required.

See also: psis_loo, PsisLoo.

ParetoSmooth.psisMethod
psis(
    log_ratios::AbstractArray{T:>AbstractFloat}, 
    r_eff; 
    source::String="mcmc", 
    log_weights::Bool=false
) -> Psis

Implements Pareto-smoothed importance sampling (PSIS).

Arguments

Positional Arguments

  • log_ratios::AbstractArray{T}: An array of importance ratios on the log scale (for

PSIS-LOO these are negative log-likelihood values). Indices must be ordered as [data, draw, chain]: log_ratios[1, 2, 3] should be the log-likelihood of the first data point, evaluated at the second iteration of the third chain. Chain indices can be left off if there is only a single chain, or if keyword argument chain_index is provided.

  • r_eff::AbstractArray{T}: An (optional) vector of relative effective sample sizes used

in ESS calculations. If left empty, calculated automatically using the FFTESS method from InferenceDiagnostics.jl. See relative_eff to calculate these values.

Keyword Arguments

  • chain_index::Vector{Integer}: An (optional) vector of integers indicating which chain

each sample belongs to.

  • source::String="mcmc": A string or symbol describing the source of the sample being

used. If "mcmc", adjusts ESS for autocorrelation. Otherwise, samples are assumed to be independent. Currently permitted values are ["mcmc", "vi", "other"].

  • log_weights::Bool=false: Return log weights, rather than the PSIS weights.
ParetoSmooth.psis_essMethod
function psis_ess(
    weights::AbstractVector{T},
    r_eff::AbstractVector{T}
) -> AbstractVector{T}

Calculate the (approximate) effective sample size of a PSIS sample, using the correction in Vehtari et al. 2019.

Arguments

  • weights: A set of importance sampling weights derived from PSIS.
  • r_eff: The relative efficiency of the MCMC chains from which PSIS samples were derived. See ?relative_eff to calculate r_eff.
ParetoSmooth.psis_looMethod
function psis_loo(log_likelihood::Array{Float} [, args...];
    source::String="mcmc" [, chain_index::Vector{Int}, kwargs...]
) -> PsisLoo

Use Pareto-Smoothed Importance Sampling to calculate the leave-one-out cross validation score.

Arguments

  • log_likelihood::Array: An array or matrix of log-likelihood values indexed as [data, step, chain]. The chain argument can be left off if chain_index is provided or if all posterior samples were drawn from a single chain.
  • args...: Positional arguments to be passed to psis.
  • chain_index::Vector: A vector of integers specifying which chain each iteration belongs to. For instance, chain_index[iteration] should return 2 if log_likelihood[:, step] belongs to the second chain.
  • kwargs...: Keyword arguments to be passed to psis.

See also: psis, loo, PsisLoo.

ParetoSmooth.psis_smooth_tail!Method
psis_smooth_tail!(tail::AbstractVector{T}, cutoff::T) where {T<:AbstractFloat} -> ξ::T

Takes an already sorted vector of observations from the tail and smooths it in place with PSIS before returning shape parameter ξ.

ParetoSmooth.relative_effMethod
relative_eff(sample::AbstractArray{AbstractFloat, 3}; method=FFTESSMethod())

Calculate the relative efficiency of an MCMC chain, i.e. the effective sample size divided by the nominal sample size.