Kullback-Leibler Divergences

Using the Distributions type provided by Distributions.jl, the KullbackLeibler method offers a convenient way of computing the Kullback-Leibler divergence between distributions. In several cases an analytical expression for the Kullback-Leibler divergence is known. These include: (univariate and multivariate) Normal, Cauchy, Exponential, Weibull and Gamma distributions.

Furthermore, for distributions over a one-dimensional domain where no analytic result is known, KullbackLeibler rephrases the integral in terms of an ODE and employs an efficient integration scheme from the DifferentialEquations.jl suite. For multivariate distributions, Monte Carlo integration is used.

Examples of use:

KullbackLeibler(Cauchy(1.,2.4),Normal(-4,0.5),HyperCube([-100,100]);Carlo=false,tol=1e-12)
KullbackLeibler(MvNormal([0,2.5],diagm([1,4.])),MvTDist(1,[3,2],diagm([2.,3.])),HyperCube([[-50,50],[-50,50]]); N=Int(1e8))

In addition, it is of course also possible to input generic functions, whose positivity and normalization should be ensured by the user.

InformationGeometry.KullbackLeiblerMethod
KullbackLeibler(p::Function,q::Function,Domain::HyperCube=HyperCube([[-15,15]]); tol=2e-15, N::Int=Int(3e7), Carlo::Bool=(Domain.dim!=1))

Computes the Kullback-Leibler divergence between two probability distributions p and q over the Domain. If Carlo=true, this is done using a Monte Carlo Simulation with N samples. If the Domain is one-dimensional, the calculation is performed without Monte Carlo to a tolerance of ≈ tol.

\[D_{\text{KL}}[p,q] \coloneqq \int \mathrm{d}^m y \, p(y) \, \mathrm{ln} \bigg( \frac{p(y)}{q(y)} \bigg)\]

For example, the Kullback-Leibler divergence between a Cauchy distribution with $\mu=1$ and $s=2$ and a normal (i.e. Gaussian) distribution with $\mu=-4$ and $\sigma=1/2$ can be calculated via:

using InformationGeometry # hide
using LinearAlgebra, Distributions
KullbackLeibler(Cauchy(1.,2.),Normal(-4.,0.5),HyperCube([-100,100]); Carlo=false,tol=1e-12)

Specifically, the keyword arguments used here numerically compute the divergence over the domain $[-100,100]$ to an accuracy of $10^{-12}$.

The domain of the integral involved in the computation of the divergence is specified using the HyperCube datatype, which stores a cuboid region in $N$ dimensions as a vector of intervals.

InformationGeometry.HyperCubeType

The HyperCube type has the fields vals::Vector{Vector}, which stores the intervals which define the hypercube and dim::Int, which gives the dimension. Overall it just offers a convenient and standardized way of passing domains for integration or plotting between functions without having to check that these domains are sensible every time. Examples for constructing HyperCubes:

HyperCube([[1,3],[pi,2pi],[-500.0,100.0]])
HyperCube([[-1,1]])
HyperCube([-1,1])
HyperCube(LowerUpper([-1,-5],[0,-4]))
HyperCube(collect([-7,7.] for i in 1:3))

The HyperCube type is closely related to the LowerUpper type and they can be easily converted into each other. Examples of quantities that can be computed from and operations involving a HyperCube object X:

CubeVol(X)
TranslateCube(X,v::Vector)
CubeWidths(X)

Furthermore, the Kullback-Leibler divergence between multivariate distributions can be computed for example by

KullbackLeibler(MvNormal([0,2.5],diagm([1,4.])),MvTDist(1,[3,2],diagm([2.,3.])),HyperCube([[-50,50],[-50,50]]); N=Int(5e6))

where it now becomes necessary to employ Monte Carlo schemes. Specifically, the keyword argument N now determines the number of points where the integrand is evaluated over the domain $[-50,50] \times [-50,50]$.

So far, importance sampling has not been implemented for the Monte Carlo integration. Instead, the domain is sampled uniformly.