EvidentialFlux.jl

Evidential Deep Learning is a way to generate predictions and the uncertainty associated with them in one single forward pass. This is in stark contrast to traditional Bayesian neural networks which are typically based on Variational Inference, Markov Chain Monte Carlo, Monte Carlo Dropout or Ensembles.

Deep Evidential Regression

Deep Evidential Regression^[amini2020] is an attempt to apply the principles of Evidential Deep Learning to regression type problems.

It works by putting a prior distribution over the likelihood parameters $\mathbf{\theta} = \{\mu, \sigma^2\}$ governing a likelihood model where we observe a dataset $\mathcal{D}=\{x_i, y_i\}_{i=1}^N$ where $y_i$ is assumed to be drawn i.i.d. from a Gaussian distribution.

\[y_i \sim \mathcal{N}(\mu_i, \sigma^2_i)\]

We can express the posterior parameters $\mathbf{\theta}=\{\mu, \sigma^2\}$ as $p(\mathbf{\theta}|\mathcal{D})$. We seek to create an approximation $q(\mu, \sigma^2) = q(\mu)(\sigma^2)$ meaning that we assume that the posterior factorizes. This means we can write $\mu\sim\mathcal{N}(\gamma,\sigma^2\nu^{-1})$ and $\sigma^2\sim\Gamma^{-1}(\alpha,\beta)$. Thus, we can now form

\[p(\mathbf{\theta}|\mathbf{m})=\mathcal{N}(\gamma,\sigma^2\nu^{-1})\Gamma^{-1}(\alpha,\beta)=\mathcal{N-}\Gamma^{-1}(γ,υ,α,β)\]

which can be plugged in to the posterior below.

\[p(\mathbf{\theta}|\mathbf{m}, y_i) = \frac{p(y_i|\mathbf{\theta}, \mathbf{m})p(\mathbf{\theta}|\mathbf{m})}{p(y_i|\mathbf{m})}\]

Now since the likelihood is Gaussian we would like to put a conjugate prior on the parameters of that likelihood and the Normal Inverse Gamma $\mathcal{N-}\Gamma^{-1}(γ, υ, α, β)$ fits the bill. I'm being a bit handwavy here but this allows us to express the prediction and the associated uncertainty as below.

\[\underset{Prediction}{\underbrace{\mathbb{E}[\mu]=\gamma}}~~~~ \underset{Aleatoric}{\underbrace{\mathbb{E}[\sigma^2]=\frac{\beta}{\alpha-1}}}~~~~ \underset{Epistemic}{\underbrace{\text{Var}[\mu]=\frac{\beta}{\nu(\alpha-1)}}}\]

The NIG layer in EvidentialFlux.jl outputs 4 tensors for each target variable, namely $\gamma,\nu,\alpha,\beta$. This means that in one forward pass we can estimate the prediction, the heteroskedastic aleatoric uncertainty as well as the epistemic uncertainty. Boom!

Deep Evidential Classification

Not yet implemented.

Functions

EvidentialFlux.NIG — Type

NIG(in => out, σ=NNlib.softplus; bias=true, init=Flux.glorot_uniform)
NIG(W::AbstractMatrix, [bias, σ])

Create a fully connected layer which implements the NormalInverseGamma Evidential distribution whose forward pass is simply given by:

y = W * x .+ bias

The input x should be a vector of length in, or batch of vectors represented as an in × N matrix, or any array with size(x,1) == in. The out y will be a vector of length out*4, or a batch with size(y) == (out*4, size(x)[2:end]...) The output will have applied the function σ(y) to each row/element of y except the first out ones. Keyword bias=false will switch off trainable bias for the layer. The initialisation of the weight matrix is W = init(out*4, in), calling the function given to keyword init, with default glorot_uniform. The weight matrix and/or the bias vector (of length out) may also be provided explicitly. Remember that in this case the number of rows in the weight matrix W MUST be a multiple of 4. The same holds true for the bias vector.

Arguments:

(in, out): number of input and output neurons
σ: The function to use to secure positive only outputs which defaults to the softplus function.
init: The function to use to initialise the weight matrix.
bias: Whether to include a trainable bias vector.

EvidentialFlux.predict — Function

predict(m, x)

Returns the predictions along with the epistemic and aleatoric uncertainty.

Arguments:

m: the model which has to have the last layer be Normal Inverse Gamma(NIG) layer
x: the input data which has to be given as an array or vector

EvidentialFlux.uncertainty — Function

uncertainty(ν, α, β)

Calculates the epistemic uncertainty of the predictions from the Normal Inverse Gamma (NIG) model. Given a $\text{N-}\Gamma^{-1}(γ, υ, α, β)$ distribution we can calculate the epistemic uncertainty as

$Var[μ] = \frac{β}{ν(α-1)}$

Arguments:

ν: the ν parameter of the NIG distribution which relates to it's precision and whose shape should be (O, B)
α: the α parameter of the NIG distribution which relates to it's precision and whose shape should be (O, B)
β: the β parameter of the NIG distribution which relates to it's uncertainty and whose shape should be (O, B)

uncertainty(α, β)

Calculates the aleatoric uncertainty of the predictions from the Normal Inverse Gamma (NIG) model. Given a $\text{N-}\Gamma^{-1}(γ, υ, α, β)$ distribution we can calculate the aleatoric uncertainty as

$\mathbb{E}[σ^2] = \frac{β}{(α-1)}$

Arguments:

α: the α parameter of the NIG distribution which relates to it's precision and whose shape should be (O, B)
β: the β parameter of the NIG distribution which relates to it's uncertainty and whose shape should be (O, B)

uncertainty(α)

Calculates the epistemic uncertainty associated with a MultinomialDirichlet model (DIR) layer.

α: the α parameter of the Dirichlet distribution which relates to it's concentrations and whose shape should be (O, B)

EvidentialFlux.evidence — Function

evidence(α)

Calculates the total evidence of assigning each observation in α to the respective class for a DIR layer.

α: the α parameter of the Dirichlet distribution which relates to it's concentrations and whose shape should be (O, B)

evidence(ν, α)

Returns the evidence for the data pushed through the NIG layer. In this setting one way of looking at the NIG distribution is as ν virtual observations governing the mean μ of the likelihood and α virtual observations governing the variance $\sigma^2$. The evidence is then a sum of the virtual observations. Amini et. al. goes through this interpretation in their 2020 paper.

Arguments:

ν: the ν parameter of the NIG distribution which relates to it's precision and whose shape should be (O, B)
α: the α parameter of the NIG distribution which relates to it's precision and whose shape should be (O, B)

EvidentialFlux.nigloss — Function

nigloss(y, γ, ν, α, β, λ = 1, ϵ = 0.0001)

This is the standard loss function for Evidential Inference given a NormalInverseGamma posterior for the parameters of the gaussian likelihood function: μ and σ.

Arguments:

y: the targets whose shape should be (O, B)
γ: the γ parameter of the NIG distribution which corresponds to it's mean and whose shape should be (O, B)
ν: the ν parameter of the NIG distribution which relates to it's precision and whose shape should be (O, B)
α: the α parameter of the NIG distribution which relates to it's precision and whose shape should be (O, B)
β: the β parameter of the NIG distribution which relates to it's uncertainty and whose shape should be (O, B)
λ: the weight to put on the regularizer (default: 1)
ϵ: the threshold for the regularizer (default: 0.0001)

References

amini2020Amini, Alexander, Wilko Schwarting, Ava Soleimany, and Daniela Rus. “Deep Evidential Regression.” ArXiv:1910.02600 [Cs, Stat], November 24, 2020. http://arxiv.org/abs/1910.02600.