ConformalPrediction

Documentation for ConformalPrediction.jl.

ConformalPrediction.jl is a package for Uncertainty Quantification (UQ) through Conformal Prediction (CP) in Julia. It is designed to work with supervised models trained in MLJ. Conformal Prediction is distribution-free, easy-to-understand, easy-to-use and model-agnostic.

Installation ๐Ÿšฉ

You can install the first stable release from the general registry:

using Pkg
Pkg.add("ConformalPrediction")

The development version can be installed as follows:

using Pkg
Pkg.add(url="https://github.com/pat-alt/ConformalPrediction.jl")

Status ๐Ÿ”

This package is in its very early stages of development and therefore still subject to changes to the core architecture. The following approaches have been implemented in the development version:

Regression:

  • Inductive
  • Naive Transductive
  • Jackknife
  • Jackknife+
  • Jackknife-minmax
  • CV+
  • CV-minmax

Classification:

  • Inductive (LABEL (Sadinle, Lei, and Wasserman 2019))
  • Adaptive Inductive

I have only tested it for a few of the supervised models offered by MLJ.

Usage Example ๐Ÿ”

To illustrate the intended use of the package, letโ€™s have a quick look at a simple regression problem. Using MLJ we first generate some synthetic data and then determine indices for our training, calibration and test data:

using MLJ
X, y = MLJ.make_regression(1000, 2)
train, test = partition(eachindex(y), 0.4, 0.4)

We then import a decision tree (DecisionTree) following the standard MLJ procedure.

DecisionTreeRegressor = @load DecisionTreeRegressor pkg=DecisionTree
model = DecisionTreeRegressor() 

To turn our conventional model into a conformal model, we just need to declare it as such by using conformal_model wrapper function. The generated conformal model instance can wrapped in data to create a machine. Finally, we proceed by fitting the machine on training data using the generic fit! method:

using ConformalPrediction
conf_model = conformal_model(model)
mach = machine(conf_model, X, y)
fit!(mach, rows=train)

Predictions can then be computed using the generic predict method. The code below produces predictions for the first n samples. Each tuple contains the lower and upper bound for the prediction interval.

n = 10
Xtest = selectrows(X, first(test,n))
ytest = y[first(test,n)]
predict(mach, Xtest)
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚                                                                  โ”‚
โ”‚       (1)   ([-0.16035036780321532], [1.4939904924997824])       โ”‚
โ”‚       (2)   ([-1.086589388667894], [0.5677514716351038])         โ”‚
โ”‚       (3)   ([-1.086589388667894], [0.5677514716351038])         โ”‚
โ”‚       (4)   ([-1.6661164684544767], [-0.011775608151479156])     โ”‚
โ”‚       (5)   ([-3.0116018507211617], [-1.3572609904181638])       โ”‚
โ”‚       (6)   ([0.5337083913933376], [2.1880492516963352])         โ”‚
โ”‚       (7)   ([-1.2219266921060266], [0.43241416819697115])       โ”‚
โ”‚       (8)   ([-1.6867950029289869], [-0.032454142625989335])     โ”‚
โ”‚       (9)   ([-2.0599181285783263], [-0.4055772682753287])       โ”‚
โ”‚      (10)   ([-0.06499897951385392], [1.5893418807891437])       โ”‚
โ”‚                                                                  โ”‚
โ”‚                                                                  โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ 10 items โ”€โ”€โ”€โ•ฏ

Contribute ๐Ÿ› 

Contributions are welcome! Please follow the SciML ColPrac guide.

References ๐ŸŽ“

Sadinle, Mauricio, Jing Lei, and Larry Wasserman. 2019. โ€œLeast Ambiguous Set-Valued Classifiers with Bounded Error Levels.โ€ Journal of the American Statistical Association 114 (525): 223โ€“34.