`CalibrationErrors.Bin`

— Method`Bin(predictions, targets)`

Create bin of `predictions`

and corresponding `targets`

.

`CalibrationErrors.Bin`

— Method`Bin(prediction, target)`

Create bin of a single `prediction`

and corresponding `target`

.

`CalibrationErrors.CalibrationErrorEstimator`

— Method`(estimator::CalibrationErrorEstimator)(predictions, targets)`

Estimate the calibration error of a model from the set of `predictions`

and corresponding `targets`

using the `estimator`

.

`CalibrationErrors.ECE`

— Method`ECE(binning[, distance = TotalVariation()])`

Estimator of the expected calibration error (ECE) for a classification model with respect to the given `distance`

function using the `binning`

algorithm.

For classification models, the predictions $P_{X_i}$ and targets $Y_i$ are identified with vectors in the probability simplex. The estimator of the ECE is defined as

\[\frac{1}{B} \sum_{i=1}^B d\big(\overline{P}_i, \overline{Y}_i\big),\]

where $B$ is the number of non-empty bins, $d$ is the distance function, and $\overline{P}_i$ and $\overline{Y}_i$ are the average vector of the predictions and the average vector of targets in the $i$th bin. By default, the total variation distance is used.

The `distance`

has to be a function of the form

`distance(pbar::Vector{<:Real}, ybar::Vector{<:Real}).`

In particular, distance measures of the package Distances.jl are supported.

`CalibrationErrors.MedianVarianceBinning`

— Type`MedianVarianceBinning([minsize::Int = 10, maxbins::Int = typemax(Int)])`

Dynamic binning scheme of the probability simplex with at most `maxbins`

bins that each contain at least `minsize`

samples.

The data set is split recursively as long as it is possible to split the bins while satisfying these conditions. In each step, the bin with the maximum variance of predicted probabilities for any component is selected and split at the median of the predicted probability of the component with the largest variance.

`CalibrationErrors.SKCE`

— Type`SKCE(k; unbiased::Bool=true, blocksize=identity)`

Estimator of the squared kernel calibration error (SKCE) with kernel `k`

.

Kernel `k`

on the product space of predictions and targets has to be a `Kernel`

from the Julia package KernelFunctions.jl that can be evaluated for inputs that are tuples of predictions and targets.

One can choose an unbiased or a biased variant with `unbiased=true`

or `unbiased=false`

, respectively (see details below).

The SKCE is estimated as the average estimate of different blocks of samples. The number of samples per block is set by `blocksize`

:

- If
`blocksize`

is a function`blocksize(n::Int)`

, then the number of samples per block is set to`blocksize(n)`

where`n`

is the total number of samples. - If
`blocksize`

is an integer, then the number of samplers per block is set to`blocksize`

, indepedent of the total number of samples.

The default setting `blocksize=identity`

implies that a single block with all samples is used.

The number of samples per block must be at least 1 if `unbiased=false`

and 2 if `unbiased=true`

. Additionally, it must be at most the total number of samples. Note that the last block is neglected if it is incomplete (see details below).

**Details**

The unbiased estimator is not guaranteed to be non-negative whereas the biased estimator is always non-negative.

The sample complexity of the estimator is $O(mn)$, where $m$ is the block size and $n$ is the total number of samples. In particular, with the default setting `blocksize=identity`

the estimator has a quadratic sample complexity.

Let $(P_{X_i}, Y_i)_{i=1,\ldots,n}$ be a data set of predictions and corresponding targets. The estimator with block size $m$ is defined as

\[{\bigg\lfloor \frac{n}{m} \bigg\rfloor}^{-1} \sum_{b=1}^{\lfloor n/m \rfloor} |B_b|^{-1} \sum_{(i, j) \in B_b} h_k\big((P_{X_i}, Y_i), (P_{X_j}, Y_j)\big),\]

where

\[\begin{aligned} h_k\big((μ, y), (μ', y')\big) ={}& k\big((μ, y), (μ', y')\big) - 𝔼_{Z ∼ μ} k\big((μ, Z), (μ', y')\big) \\ & - 𝔼_{Z' ∼ μ'} k\big((μ, y), (μ', Z')\big) + 𝔼_{Z ∼ μ, Z' ∼ μ'} k\big((μ, Z), (μ', Z')\big) \end{aligned}\]

and blocks $B_b$ ($b = 1, \ldots, \lfloor n/m \rfloor$) are defined as

\[B_b = \begin{cases} \{(i, j): (b - 1) m < i < j \leq bm \} & \text{(unbiased)}, \\ \{(i, j): (b - 1) m < i, j \leq bm \} & \text{(biased)}. \end{cases}\]

**References**

Widmann, D., Lindsten, F., & Zachariah, D. (2019). Calibration tests in multi-class classification: A unifying framework. In: Advances in Neural Information Processing Systems (NeurIPS 2019) (pp. 12257–12267).

Widmann, D., Lindsten, F., & Zachariah, D. (2021). Calibration tests beyond classification.

`CalibrationErrors.UCME`

— Type`UCME(k, testpredictions, testtargets)`

Estimator of the unnormalized calibration mean embedding (UCME) with kernel `k`

and sets of `testpredictions`

and `testtargets`

.

Kernel `k`

on the product space of predictions and targets has to be a `Kernel`

from the Julia package KernelFunctions.jl that can be evaluated for inputs that are tuples of predictions and targets.

The number of test predictions and test targets must be the same and at least one.

**Details**

The estimator is biased and guaranteed to be non-negative. Its sample complexity is $O(mn)$, where $m$ is the number of test locations and $n$ is the total number of samples.

Let $(T_i)_{i=1,\ldots,m}$ be the set of test locations, i.e., test predictions and corresponding targets, and let $(P_{X_j}, Y_j)_{j=1,\ldots,n}$ be a data set of predictions and corresponding targets. The plug-in estimator of $\mathrm{UCME}_{k,m}^2$ is defined as

\[m^{-1} \sum_{i=1}^{m} {\bigg(n^{-1} \sum_{j=1}^n k\big(T_i, (P_{X_j}, Y_j)\big) - \mathbb{E}_{Z \sim P_{X_j}} k\big(T_i, (P_{X_j}, Z)\big)\bigg)}^2.\]

**References**

Widmann, D., Lindsten, F., & Zachariah, D. (2021). Calibration tests beyond classification. To be presented at *ICLR 2021*.

`CalibrationErrors.UniformBinning`

— Type`UniformBinning(nbins::Int)`

Binning scheme of the probability simplex with `nbins`

bins of uniform width for each component.

`CalibrationErrors.adddata!`

— Method`adddata!(bin::Bin, prediction, target)`

Update running statistics of the `bin`

by integrating one additional pair of `prediction`

s and `target`

.

`CalibrationErrors.unsafe_skce_eval`

— Function`unsafe_skce_eval(k, p, y, p̃, ỹ)`

Evaluate

\[k((p, y), (p̃, ỹ)) - E_{z ∼ p}[k((p, z), (p̃, ỹ))] - E_{z̃ ∼ p̃}[k((p, y), (p̃, z̃))] + E_{z ∼ p, z̃ ∼ p̃}[k((p, z), (p̃, z̃))]\]

for kernel `k`

and predictions `p`

and `p̃`

with corresponding targets `y`

and `ỹ`

.

This method assumes that `p`

, `p̃`

, `y`

, and `ỹ`

are valid and specified correctly, and does not perform any checks.