Hypothesis tests of calibration.

Build Status DOI Codecov Coveralls Code Style: Blue Aqua QA

There are also Python and R interfaces for this package


This package implements different hypothesis tests for calibration of probabilistic models in the Julia language.

The statistical tests in this package are based on the calibration error estimators in the package CalibrationErrors.jl.

pycalibration is a Python interface for CalibrationErrors.jl and CalibrationTests.jl.

rcalibration is an R interface for CalibrationErrors.jl and CalibrationTests.jl.


If you use CalibrationsTests.jl as part of your research, teaching, or other activities, please consider citing the following publications:

Widmann, D., Lindsten, F., & Zachariah, D. (2019). Calibration tests in multi-class classification: A unifying framework. In Advances in Neural Information Processing Systems 32 (NeurIPS 2019) (pp. 12257–12267).

Widmann, D., Lindsten, F., & Zachariah, D. (2021). Calibration tests beyond classification. To be presented at ICLR 2021.


This work was financially supported by the Swedish Research Council via the projects Learning of Large-Scale Probabilistic Dynamical Models (contract number: 2016-04278), Counterfactual Prediction Methods for Heterogeneous Populations (contract number: 2018-05040), and Handling Uncertainty in Machine Learning Systems (contract number: 2020-04122), by the Swedish Foundation for Strategic Research via the project Probabilistic Modeling and Inference for Machine Learning (contract number: ICA16-0015), by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation, and by ELLIIT.