Kullback-Leibler Divergences

Using the Distributions type provided by Distributions.jl, the KullbackLeibler method offers a convenient way of computing the Kullback-Leibler divergence between distributions. In several cases an analytical expression for the Kullback-Leibler divergence is known. These include: (univariate and multivariate) Normal, Cauchy, Exponential, Weibull and Gamma distributions.

Furthermore, for distributions over a one-dimensional domain where no analytic result is known, KullbackLeibler rephrases the integral in terms of an ODE and employs an efficient integration scheme from the DifferentialEquations.jl suite. For multivariate distributions, Monte Carlo integration is used.

Examples of use:

KullbackLeibler(Cauchy(1.,2.4), Normal(-4,0.5), HyperCube([-100,100]); tol=1e-12)
KullbackLeibler(MvNormal([0,2.5],diagm([1,4.])), MvTDist(1,[3,2],diagm([2.,3.])), HyperCube([[-50,50],[-50,50]]); Carlo=true, N=Int(1e8))

In addition, it is of course also possible to input generic functions, whose positivity and normalization should be ensured by the user.

InformationGeometry.KullbackLeibler — Method

KullbackLeibler(p::Function,q::Function,Domain::HyperCube=HyperCube([-15,15]); tol=2e-15, N::Int=Int(3e7), Carlo::Bool=(length(Domain)!=1))

Computes the Kullback-Leibler divergence between two probability distributions p and q over the Domain. If Carlo=true, this is done using a Monte Carlo Simulation with N samples. If the Domain is one-dimensional, the calculation is performed without Monte Carlo to a tolerance of ≈ tol.

\[D_{\text{KL}}[p,q] \coloneqq \int \mathrm{d}^m y \, p(y) \, \mathrm{ln} \bigg( \frac{p(y)}{q(y)} \bigg)\]

For example, the Kullback-Leibler divergence between a Cauchy distribution with $\mu=1$ and $s=2$ and a normal (i.e. Gaussian) distribution with $\mu=-4$ and $\sigma=1/2$ can be calculated via:

using InformationGeometry # hide
using LinearAlgebra, Distributions
KullbackLeibler(Cauchy(1.,2.), Normal(-4.,0.5), HyperCube([-100,100]); tol=1e-12)

Specifically, the keyword arguments used here numerically compute the divergence over the domain $[-100,100]$ to a tolerance of $\approx 10^{-12}$.

The domain of the integral involved in the computation of the divergence is specified using the HyperCube datatype, which stores a cuboid region in $N$ dimensions as a vector of intervals.

InformationGeometry.HyperCube — Type

The HyperCube type is used to specify a cuboid region in the form of a cartesian product of $N$ real intervals, thereby offering a convenient way of passing domains for integration or plotting between functions. A HyperCube object cube type has two fields: cube.L and cube.U which are two vectors which respectively store the lower and upper boundaries of the real intervals in order. Examples for constructing HyperCubes:

HyperCube([[1,3],[pi,2π],[-500,100]])
HyperCube([1,π,-500],[3,2π,100])
HyperCube([[-1,1]])
HyperCube([-1,1])
HyperCube(collect([-7,7.] for i in 1:3))

Examples of quantities that can be computed from and operations involving a HyperCube object X:

CubeVol(X)
TranslateCube(X,v::Vector)
CubeWidths(X)

Furthermore, the Kullback-Leibler divergence between multivariate distributions can be computed for example by

KullbackLeibler(MvNormal([0,2.5],diagm([1,4.])), MvTDist(1,[3,2],diagm([2.,3.])), HyperCube([[-20,50],[-20,50]]); tol=1e-8)

using adaptive integration methods from HCubature.jl. Alternatively, Monte Carlo integration can be employed by specifying Carlo=true:

KullbackLeibler(MvNormal([0,2.5],diagm([1,4.])), MvTDist(1,[3,2],diagm([2.,3.])), HyperCube([[-50,50],[-50,50]]); Carlo=true, N=Int(5e6))

In addition, the keyword argument N now determines the number of points where the integrand is evaluated over the given domain $[-50,50] \times [-50,50]$.

So far, importance sampling has not been implemented for the Monte Carlo integration. Instead, the domain is sampled uniformly.