Estimation of calibration errors.

Stable Dev Build Status DOI Codecov Coveralls Code Style: Blue Aqua QA

There are also Python and R interfaces for this package


This package implements different estimators of the expected calibration error (ECE), the squared kernel calibration error (SKCE), and the unnormalized calibration mean embedding (UCME) in the Julia language.

This package supports calibration error estimation of classification models that output vectors of class probabilities. In addition, SKCE and UCME can be estimated for more general probabilistic predictive models that output probability distributions defined in Distributions.jl such as normal and Laplace distributions.


Calibration errors can be estimated from a data set of predicted probability distributions and a set of corresponding observed targets by executing

estimator(predictions, targets)

The sets of predictions and targets have to be provided as vectors.

This package implements the estimator ECE of the ECE, the estimator SKCE for the SKCE (unbiased and biased variants with different sample complexity), and UCME for the UCME.

CalibrationTests.jl implements statistical hypothesis tests of calibration.

pycalibration is a Python interface for CalibrationErrors.jl and CalibrationTests.jl.

rcalibration is an R interface for CalibrationErrors.jl and CalibrationTests.jl.

Talk at JuliaCon 2021

Calibration analysis of probabilistic models in Julia

The slides of the talk are available as Pluto notebook.


If you use CalibrationErrors.jl as part of your research, teaching, or other activities, please consider citing the following publications:

Widmann, D., Lindsten, F., & Zachariah, D. (2019). Calibration tests in multi-class classification: A unifying framework. In Advances in Neural Information Processing Systems 32 (NeurIPS 2019) (pp. 12257–12267).

Widmann, D., Lindsten, F., & Zachariah, D. (2021). Calibration tests beyond classification. International Conference on Learning Representations (ICLR 2021).


This work was financially supported by the Swedish Research Council via the projects Learning of Large-Scale Probabilistic Dynamical Models (contract number: 2016-04278), Counterfactual Prediction Methods for Heterogeneous Populations (contract number: 2018-05040), and Handling Uncertainty in Machine Learning Systems (contract number: 2020-04122), by the Swedish Foundation for Strategic Research via the project Probabilistic Modeling and Inference for Machine Learning (contract number: ICA16-0015), by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation, and by ELLIIT.