CalibrationTests.AsymptoticSKCETestType
AsymptoticSKCETest(kernel::Kernel, predictions, targets)

Calibration hypothesis test based on the unbiased estimator of the squared kernel calibration error (SKCE) with quadratic sample complexity.

Details

Let $\mathcal{D} = (P_{X_i}, Y_i)_{i=1,\ldots,n}$ be a data set of predictions and corresponding targets. Denote the null hypothesis "the predictive probabilistic model is calibrated" with $H_0$.

The hypothesis test approximates the p-value $ℙ(\mathrm{SKCE}_{uq} > c \,|\, H_0)$, where $\mathrm{SKCE}_{uq}$ is the unbiased estimator of the SKCE, defined as

\[\frac{2}{n(n-1)} \sum_{1 \leq i < j \leq n} h_k\big((P_{X_i}, Y_i), (P_{X_j}, Y_j)\big),\]

where

\[\begin{aligned} h_k\big((μ, y), (μ', y')\big) ={}& k\big((μ, y), (μ', y')\big) - 𝔼_{Z ∼ μ} k\big((μ, Z), (μ', y')\big) \\ & - 𝔼_{Z' ∼ μ'} k\big((μ, y), (μ', Z')\big) + 𝔼_{Z ∼ μ, Z' ∼ μ'} k\big((μ, Z), (μ', Z')\big). \end{aligned}\]

The p-value is estimated based on the asymptotically valid approximation

\[ℙ(n\mathrm{SKCE}_{uq} > c \,|\, H_0) \approx ℙ(T > c \,|\, \mathcal{D}),\]

where $T$ is the bootstrap statistic

\[T = \frac{2}{n} \sum_{1 \leq i < j \leq n} \bigg(h_k\big((P^*_{X_i}, Y^*_i), (P^*_{X_j}, Y^*_j)\big) - \frac{1}{n} \sum_{r = 1}^n h_k\big((P^*_{X_i}, Y^*_i), (P_{X_r}, Y_r)\big) - \frac{1}{n} \sum_{r = 1}^n h_k\big((P_{X_r}, Y_r), (P^*_{X_j}, Y^*_j)\big) + \frac{1}{n^2} \sum_{r, s = 1}^n h_k\big((P_{X_r}, Y_r), (P_{X_s}, Y_s)\big)\bigg)\]

for bootstrap samples $(P^*_{X_i}, Y^*_i)_{i=1,\ldots,n}$ of $\mathcal{D}$. This can be reformulated to the approximation

\[ℙ(n\mathrm{SKCE}_{uq}/(n - 1) - \mathrm{SKCE}_b > c \,|\, H_0) \approx ℙ(T' > c \,|\, \mathcal{D}),\]

where

\[\mathrm{SKCE}_b = \frac{1}{n^2} \sum_{i, j = 1}^n h_k\big((P_{X_i}, Y_i), (P_{X_j}, Y_j)\big)\]

and

\[T' = \frac{2}{n(n - 1)} \sum_{1 \leq i < j \leq n} h_k\big((P^*_{X_i}, Y^*_i), (P^*_{X_j}, Y^*_j)\big) - \frac{2}{n^2} \sum_{i, r=1}^n h_k\big((P^*_{X_i}, Y^*_i), (P_{X_r}, Y_r)\big).\]

References

Widmann, D., Lindsten, F., & Zachariah, D. (2019). Calibration tests in multi-class classification: A unifying framework. In: Advances in Neural Information Processing Systems (NeurIPS 2019) (pp. 12257–12267).

Widmann, D., Lindsten, F., & Zachariah, D. (2021). Calibration tests beyond classification.

CalibrationTests.bootstrap_ccdfMethod
bootstrap_ccdf(rng::AbstractRNG, statistic, kernelmatrix, bootstrap_iters::Int)

Estimate the value of the inverse CDF of the test statistic under the calibration null hypothesis by bootstrapping.

Details

Let $\mathcal{D} = (P_{X_i}, Y_i)_{i=1,\ldots,n}$ be a data set of predictions and corresponding targets. Denote the null hypothesis "the predictive probabilistic model is calibrated" with $H_0$, and the test statistic with $T$.

The value of the inverse CDF under the null hypothesis is estimated based on the asymptotically valid approximation

\[ℙ(T > c \,|\, H_0) \approx ℙ(T' > c \,|\, \mathcal{D}),\]

where the bootstrap statistic $T'$ is defined as

\[T' = \frac{2}{n(n - 1)} \sum_{1 \leq i < j \leq n} h_k\big((P^*_{X_i}, Y^*_i), (P^*_{X_j}, Y^*_j)\big) - \frac{2}{n^2} \sum_{i, r=1}^n h_k\big((P^*_{X_i}, Y^*_i), (P_{X_r}, Y_r)\big)\]

for bootstrap samples $(P^*_{X_i}, Y^*_i)_{i=1,\ldots,n}$ of $\mathcal{D}$ (see AsymptoticSKCETest).

Let $C_i$ be the number of times that data pair $(P_{X_i}, Y_i)$ was resampled. Then we obtain

\[T' = \frac{1}{n^2} \sum_{i=1}^n C_i \sum_{j=1}^n \bigg(\frac{n}{n-1} (C_j - \delta_{i,j}) - 2\bigg) h_k\big((P_{X_i}, Y_i), (P_{X_j}, Y_j)\big).\]

CalibrationTests.estimate_statistic_kernelmatrixMethod
estimate_statistic_kernelmatrix(kernel, predictions, targets)

Compute the estimate of the SKCE, the test statistic, and the matrix of the evaluations of the kernel function.

Details

Let $\mathcal{D} = (P_{X_i}, Y_i)_{i=1,\ldots,n}$ be a data set of predictions and corresponding targets.

The unbiased estimator $\mathrm{SKCE}_{uq}$ of the SKCE is defined as

\[\frac{2}{n(n-1)} \sum_{1 \leq i < j \leq n} h_k\big((P_{X_i}, Y_i), (P_{X_j}, Y_j)\big),\]

where

\[\begin{aligned} h_k\big((μ, y), (μ', y')\big) ={}& k\big((μ, y), (μ', y')\big) - 𝔼_{Z ∼ μ} k\big((μ, Z), (μ', y')\big) \\ & - 𝔼_{Z' ∼ μ'} k\big((μ, y), (μ', Z')\big) + 𝔼_{Z ∼ μ, Z' ∼ μ'} k\big((μ, Z), (μ', Z')\big). \end{aligned}\]

The test statistic is defined as

\[\frac{n}{n-1} \mathrm{SKCE}_{uq} - \mathrm{SKCE}_b,\]

where

\[\mathrm{SKCE}_b = \frac{1}{n^2} \sum_{i, j = 1}^n h_k\big((P_{X_i}, Y_i), (P_{X_j}, Y_j)\big)\]

(see AsymptoticSKCETest). This is equivalent to

\[\frac{1}{n^2} \sum_{i, j = 1} h_k\big((P_{X_i}, Y_i), (P_{X_j}, Y_j)\big) \bigg(\frac{n^2}{(n - 1)^2} 1(i \neq j) - \bigg).\]

The kernelmatrix $K \in \mathbb{R}^{n \times n}$ is defined as

\[K_{ij} = h_k\big((P_{X_i}, Y_i), (P_{X_j}, Y_j)\big)\]

for $i, j \in \{1, \ldots, n\}$.