CalibrationErrors.Bin
— MethodBin(predictions, targets)
Create bin of predictions
and corresponding targets
.
CalibrationErrors.Bin
— MethodBin(prediction, target)
Create bin of a single prediction
and corresponding target
.
CalibrationErrors.CalibrationErrorEstimator
— Method(estimator::CalibrationErrorEstimator)(predictions, targets)
Estimate the calibration error of a model from the set of predictions
and corresponding targets
using the estimator
.
CalibrationErrors.ECE
— MethodECE(binning[, distance = TotalVariation()])
Estimator of the expected calibration error (ECE) for a classification model with respect to the given distance
function using the binning
algorithm.
For classification models, the predictions $P_{X_i}$ and targets $Y_i$ are identified with vectors in the probability simplex. The estimator of the ECE is defined as
\[\frac{1}{B} \sum_{i=1}^B d\big(\overline{P}_i, \overline{Y}_i\big),\]
where $B$ is the number of non-empty bins, $d$ is the distance function, and $\overline{P}_i$ and $\overline{Y}_i$ are the average vector of the predictions and the average vector of targets in the $i$th bin. By default, the total variation distance is used.
The distance
has to be a function of the form
distance(pbar::Vector{<:Real}, ybar::Vector{<:Real}).
In particular, distance measures of the package Distances.jl are supported.
CalibrationErrors.MedianVarianceBinning
— TypeMedianVarianceBinning([minsize::Int = 10, maxbins::Int = typemax(Int)])
Dynamic binning scheme of the probability simplex with at most maxbins
bins that each contain at least minsize
samples.
The data set is split recursively as long as it is possible to split the bins while satisfying these conditions. In each step, the bin with the maximum variance of predicted probabilities for any component is selected and split at the median of the predicted probability of the component with the largest variance.
CalibrationErrors.SKCE
— TypeSKCE(k; unbiased::Bool=true, blocksize=identity)
Estimator of the squared kernel calibration error (SKCE) with kernel k
.
Kernel k
on the product space of predictions and targets has to be a Kernel
from the Julia package KernelFunctions.jl that can be evaluated for inputs that are tuples of predictions and targets.
One can choose an unbiased or a biased variant with unbiased=true
or unbiased=false
, respectively (see details below).
The SKCE is estimated as the average estimate of different blocks of samples. The number of samples per block is set by blocksize
:
- If
blocksize
is a functionblocksize(n::Int)
, then the number of samples per block is set toblocksize(n)
wheren
is the total number of samples. - If
blocksize
is an integer, then the number of samplers per block is set toblocksize
, indepedent of the total number of samples.
The default setting blocksize=identity
implies that a single block with all samples is used.
The number of samples per block must be at least 1 if unbiased=false
and 2 if unbiased=true
. Additionally, it must be at most the total number of samples. Note that the last block is neglected if it is incomplete (see details below).
Details
The unbiased estimator is not guaranteed to be non-negative whereas the biased estimator is always non-negative.
The sample complexity of the estimator is $O(mn)$, where $m$ is the block size and $n$ is the total number of samples. In particular, with the default setting blocksize=identity
the estimator has a quadratic sample complexity.
Let $(P_{X_i}, Y_i)_{i=1,\ldots,n}$ be a data set of predictions and corresponding targets. The estimator with block size $m$ is defined as
\[{\bigg\lfloor \frac{n}{m} \bigg\rfloor}^{-1} \sum_{b=1}^{\lfloor n/m \rfloor} |B_b|^{-1} \sum_{(i, j) \in B_b} h_k\big((P_{X_i}, Y_i), (P_{X_j}, Y_j)\big),\]
where
\[\begin{aligned} h_k\big((μ, y), (μ', y')\big) ={}& k\big((μ, y), (μ', y')\big) - 𝔼_{Z ∼ μ} k\big((μ, Z), (μ', y')\big) \\ & - 𝔼_{Z' ∼ μ'} k\big((μ, y), (μ', Z')\big) + 𝔼_{Z ∼ μ, Z' ∼ μ'} k\big((μ, Z), (μ', Z')\big) \end{aligned}\]
and blocks $B_b$ ($b = 1, \ldots, \lfloor n/m \rfloor$) are defined as
\[B_b = \begin{cases} \{(i, j): (b - 1) m < i < j \leq bm \} & \text{(unbiased)}, \\ \{(i, j): (b - 1) m < i, j \leq bm \} & \text{(biased)}. \end{cases}\]
References
Widmann, D., Lindsten, F., & Zachariah, D. (2019). Calibration tests in multi-class classification: A unifying framework. In: Advances in Neural Information Processing Systems (NeurIPS 2019) (pp. 12257–12267).
Widmann, D., Lindsten, F., & Zachariah, D. (2021). Calibration tests beyond classification.
CalibrationErrors.UCME
— TypeUCME(k, testpredictions, testtargets)
Estimator of the unnormalized calibration mean embedding (UCME) with kernel k
and sets of testpredictions
and testtargets
.
Kernel k
on the product space of predictions and targets has to be a Kernel
from the Julia package KernelFunctions.jl that can be evaluated for inputs that are tuples of predictions and targets.
The number of test predictions and test targets must be the same and at least one.
Details
The estimator is biased and guaranteed to be non-negative. Its sample complexity is $O(mn)$, where $m$ is the number of test locations and $n$ is the total number of samples.
Let $(T_i)_{i=1,\ldots,m}$ be the set of test locations, i.e., test predictions and corresponding targets, and let $(P_{X_j}, Y_j)_{j=1,\ldots,n}$ be a data set of predictions and corresponding targets. The plug-in estimator of $\mathrm{UCME}_{k,m}^2$ is defined as
\[m^{-1} \sum_{i=1}^{m} {\bigg(n^{-1} \sum_{j=1}^n k\big(T_i, (P_{X_j}, Y_j)\big) - \mathbb{E}_{Z \sim P_{X_j}} k\big(T_i, (P_{X_j}, Z)\big)\bigg)}^2.\]
References
Widmann, D., Lindsten, F., & Zachariah, D. (2021). Calibration tests beyond classification. To be presented at ICLR 2021.
CalibrationErrors.UniformBinning
— TypeUniformBinning(nbins::Int)
Binning scheme of the probability simplex with nbins
bins of uniform width for each component.
CalibrationErrors.adddata!
— Methodadddata!(bin::Bin, prediction, target)
Update running statistics of the bin
by integrating one additional pair of prediction
s and target
.
CalibrationErrors.unsafe_skce_eval
— Functionunsafe_skce_eval(k, p, y, p̃, ỹ)
Evaluate
\[k((p, y), (p̃, ỹ)) - E_{z ∼ p}[k((p, z), (p̃, ỹ))] - E_{z̃ ∼ p̃}[k((p, y), (p̃, z̃))] + E_{z ∼ p, z̃ ∼ p̃}[k((p, z), (p̃, z̃))]\]
for kernel k
and predictions p
and p̃
with corresponding targets y
and ỹ
.
This method assumes that p
, p̃
, y
, and ỹ
are valid and specified correctly, and does not perform any checks.