CalibrationErrors.BinMethod
Bin(predictions, targets)

Create bin of predictions and corresponding targets.

CalibrationErrors.BinMethod
Bin(prediction, target)

Create bin of a single prediction and corresponding target.

CalibrationErrors.CalibrationErrorEstimatorMethod
(estimator::CalibrationErrorEstimator)(predictions, targets)

Estimate the calibration error of a model from the set of predictions and corresponding targets using the estimator.

CalibrationErrors.ECEMethod
ECE(binning[, distance = TotalVariation()])

Estimator of the expected calibration error (ECE) for a classification model with respect to the given distance function using the binning algorithm.

For classification models, the predictions $P_{X_i}$ and targets $Y_i$ are identified with vectors in the probability simplex. The estimator of the ECE is defined as

\[\frac{1}{B} \sum_{i=1}^B d\big(\overline{P}_i, \overline{Y}_i\big),\]

where $B$ is the number of non-empty bins, $d$ is the distance function, and $\overline{P}_i$ and $\overline{Y}_i$ are the average vector of the predictions and the average vector of targets in the $i$th bin. By default, the total variation distance is used.

The distance has to be a function of the form

distance(pbar::Vector{<:Real}, ybar::Vector{<:Real}).

In particular, distance measures of the package Distances.jl are supported.

CalibrationErrors.MedianVarianceBinningType
MedianVarianceBinning([minsize::Int = 10, maxbins::Int = typemax(Int)])

Dynamic binning scheme of the probability simplex with at most maxbins bins that each contain at least minsize samples.

The data set is split recursively as long as it is possible to split the bins while satisfying these conditions. In each step, the bin with the maximum variance of predicted probabilities for any component is selected and split at the median of the predicted probability of the component with the largest variance.

CalibrationErrors.SKCEType
SKCE(k; unbiased::Bool=true, blocksize=identity)

Estimator of the squared kernel calibration error (SKCE) with kernel k.

Kernel k on the product space of predictions and targets has to be a Kernel from the Julia package KernelFunctions.jl that can be evaluated for inputs that are tuples of predictions and targets.

One can choose an unbiased or a biased variant with unbiased=true or unbiased=false, respectively (see details below).

The SKCE is estimated as the average estimate of different blocks of samples. The number of samples per block is set by blocksize:

  • If blocksize is a function blocksize(n::Int), then the number of samples per block is set to blocksize(n) where n is the total number of samples.
  • If blocksize is an integer, then the number of samplers per block is set to blocksize, indepedent of the total number of samples.

The default setting blocksize=identity implies that a single block with all samples is used.

The number of samples per block must be at least 1 if unbiased=false and 2 if unbiased=true. Additionally, it must be at most the total number of samples. Note that the last block is neglected if it is incomplete (see details below).

Details

The unbiased estimator is not guaranteed to be non-negative whereas the biased estimator is always non-negative.

The sample complexity of the estimator is $O(mn)$, where $m$ is the block size and $n$ is the total number of samples. In particular, with the default setting blocksize=identity the estimator has a quadratic sample complexity.

Let $(P_{X_i}, Y_i)_{i=1,\ldots,n}$ be a data set of predictions and corresponding targets. The estimator with block size $m$ is defined as

\[{\bigg\lfloor \frac{n}{m} \bigg\rfloor}^{-1} \sum_{b=1}^{\lfloor n/m \rfloor} |B_b|^{-1} \sum_{(i, j) \in B_b} h_k\big((P_{X_i}, Y_i), (P_{X_j}, Y_j)\big),\]

where

\[\begin{aligned} h_k\big((μ, y), (μ', y')\big) ={}& k\big((μ, y), (μ', y')\big) - 𝔼_{Z ∼ μ} k\big((μ, Z), (μ', y')\big) \\ & - 𝔼_{Z' ∼ μ'} k\big((μ, y), (μ', Z')\big) + 𝔼_{Z ∼ μ, Z' ∼ μ'} k\big((μ, Z), (μ', Z')\big) \end{aligned}\]

and blocks $B_b$ ($b = 1, \ldots, \lfloor n/m \rfloor$) are defined as

\[B_b = \begin{cases} \{(i, j): (b - 1) m < i < j \leq bm \} & \text{(unbiased)}, \\ \{(i, j): (b - 1) m < i, j \leq bm \} & \text{(biased)}. \end{cases}\]

References

Widmann, D., Lindsten, F., & Zachariah, D. (2019). Calibration tests in multi-class classification: A unifying framework. In: Advances in Neural Information Processing Systems (NeurIPS 2019) (pp. 12257–12267).

Widmann, D., Lindsten, F., & Zachariah, D. (2021). Calibration tests beyond classification.

CalibrationErrors.UCMEType
UCME(k, testpredictions, testtargets)

Estimator of the unnormalized calibration mean embedding (UCME) with kernel k and sets of testpredictions and testtargets.

Kernel k on the product space of predictions and targets has to be a Kernel from the Julia package KernelFunctions.jl that can be evaluated for inputs that are tuples of predictions and targets.

The number of test predictions and test targets must be the same and at least one.

Details

The estimator is biased and guaranteed to be non-negative. Its sample complexity is $O(mn)$, where $m$ is the number of test locations and $n$ is the total number of samples.

Let $(T_i)_{i=1,\ldots,m}$ be the set of test locations, i.e., test predictions and corresponding targets, and let $(P_{X_j}, Y_j)_{j=1,\ldots,n}$ be a data set of predictions and corresponding targets. The plug-in estimator of $\mathrm{UCME}_{k,m}^2$ is defined as

\[m^{-1} \sum_{i=1}^{m} {\bigg(n^{-1} \sum_{j=1}^n k\big(T_i, (P_{X_j}, Y_j)\big) - \mathbb{E}_{Z \sim P_{X_j}} k\big(T_i, (P_{X_j}, Z)\big)\bigg)}^2.\]

References

Widmann, D., Lindsten, F., & Zachariah, D. (2021). Calibration tests beyond classification. To be presented at ICLR 2021.

CalibrationErrors.UniformBinningType
UniformBinning(nbins::Int)

Binning scheme of the probability simplex with nbins bins of uniform width for each component.

CalibrationErrors.adddata!Method
adddata!(bin::Bin, prediction, target)

Update running statistics of the bin by integrating one additional pair of predictions and target.

CalibrationErrors.unsafe_skce_evalFunction
unsafe_skce_eval(k, p, y, p̃, ỹ)

Evaluate

\[k((p, y), (p̃, ỹ)) - E_{z ∼ p}[k((p, z), (p̃, ỹ))] - E_{z̃ ∼ p̃}[k((p, y), (p̃, z̃))] + E_{z ∼ p, z̃ ∼ p̃}[k((p, z), (p̃, z̃))]\]

for kernel k and predictions p and with corresponding targets y and .

This method assumes that p, , y, and are valid and specified correctly, and does not perform any checks.