ConformalPrediction

Documentation for ConformalPrediction.jl.

ConformalPrediction.jl is a package for Uncertainty Quantification (UQ) through Conformal Prediction (CP) in Julia. It is designed to work with supervised models trained in MLJ. Conformal Prediction is distribution-free, easy-to-understand, easy-to-use and model-agnostic.

Installation 🚩

You can install the first stable release from the general registry:

using Pkg
Pkg.add("ConformalPrediction")

The development version can be installed as follows:

using Pkg
Pkg.add(url="https://github.com/pat-alt/ConformalPrediction.jl")

Status 🔁

This package is in its very early stages of development and therefore still subject to changes to the core architecture. The following approaches have been implemented in the development version:

Regression:

Inductive
Naive Transductive
Jackknife
Jackknife+
Jackknife-minmax
CV+
CV-minmax

Classification:

Inductive (LABEL (Sadinle, Lei, and Wasserman 2019))
Adaptive Inductive

I have only tested it for a few of the supervised models offered by MLJ.

Usage Example 🔍

To illustrate the intended use of the package, let’s have a quick look at a simple regression problem. Using MLJ we first generate some synthetic data and then determine indices for our training, calibration and test data:

using MLJ
X, y = MLJ.make_regression(1000, 2)
train, test = partition(eachindex(y), 0.4, 0.4)

We then import a decision tree (DecisionTree) following the standard MLJ procedure.

DecisionTreeRegressor = @load DecisionTreeRegressor pkg=DecisionTree
model = DecisionTreeRegressor()

To turn our conventional model into a conformal model, we just need to declare it as such by using conformal_model wrapper function. The generated conformal model instance can wrapped in data to create a machine. Finally, we proceed by fitting the machine on training data using the generic fit! method:

using ConformalPrediction
conf_model = conformal_model(model)
mach = machine(conf_model, X, y)
fit!(mach, rows=train)

Predictions can then be computed using the generic predict method. The code below produces predictions for the first n samples. Each tuple contains the lower and upper bound for the prediction interval.

n = 10
Xtest = selectrows(X, first(test,n))
ytest = y[first(test,n)]
predict(mach, Xtest)

╭──────────────────────────────────────────────────────────────────╮
│                                                                  │
│       (1)   ([-0.16035036780321532], [1.4939904924997824])       │
│       (2)   ([-1.086589388667894], [0.5677514716351038])         │
│       (3)   ([-1.086589388667894], [0.5677514716351038])         │
│       (4)   ([-1.6661164684544767], [-0.011775608151479156])     │
│       (5)   ([-3.0116018507211617], [-1.3572609904181638])       │
│       (6)   ([0.5337083913933376], [2.1880492516963352])         │
│       (7)   ([-1.2219266921060266], [0.43241416819697115])       │
│       (8)   ([-1.6867950029289869], [-0.032454142625989335])     │
│       (9)   ([-2.0599181285783263], [-0.4055772682753287])       │
│      (10)   ([-0.06499897951385392], [1.5893418807891437])       │
│                                                                  │
│                                                                  │
╰───────────────────────────────────────────────────── 10 items ───╯

Contribute 🛠

Contributions are welcome! Please follow the SciML ColPrac guide.

References 🎓

Sadinle, Mauricio, Jing Lei, and Larry Wasserman. 2019. “Least Ambiguous Set-Valued Classifiers with Bounded Error Levels.” Journal of the American Statistical Association 114 (525): 223–34.