Confidence Regions

InformationGeometry.DataSet — Type

The DataSet type is a container for data points. It holds 3 vectors x, y, sigma where the components of sigma quantify the standard deviation associated with each measurement. For example,

DS = DataSet([1,2,3.],[4,5,6.5],[0.5,0.45,0.6])

Its fields can be obtained via xdata(DS), ydata(DS), sigma(DS).

InformationGeometry.DataModel — Type

In addition to a DataSet, a DataModel contains the model as a function model(x,θ) and its derivative dmodel(x,θ) where x denotes the x-value of the data and θ is a vector of parameters on which the model depends. Crucially, dmodel contains the derivatives of the model with respect to the parameters θ, not the x-values. For example

DS = DataSet([1,2,3.],[4,5,6.5],[0.5,0.45,0.6])
model(x,p::Vector) = p[1] .* x .+ p[2]
DM = DataModel(DS,model)

If provided like this, the gradient of the model with respect to the parameters p (i.e. its "Jacobian") will be calculated using automatic differentiation. Alternatively, an explicit analytic expression for the Jacobian can be specified by hand:

function dmodel(x,p::Vector)
   J = Array{Float64}(undef, length(x), length(p))
   @. J[:,1] = x        # ∂(model)/∂p₁
   @. J[:,2] = 1.       # ∂(model)/∂p₂
   return J
end
DM = DataModel(DS,model,dmodel)

The output of the Jacobian must be a matrix whose columns correspond to the partial derivatives with respect to different components of p and whose rows correspond to evaluations at different values of x.

StatsBase.loglikelihood — Method

loglikelihood(DM::DataModel, θ::Vector)

Calculates the logarithm of the likelihood $\ell(\mathrm{data} \, | \, \theta) \coloneqq \mathrm{ln} \big( L(\mathrm{data} \, | \, \theta) \big)$ given a DataModel and a parameter configuration $\theta$.

InformationGeometry.FindMLE — Method

FindMLE(DM::DataModel,start::Union{Bool,Vector}=false; Big::Bool=false, max::Int=50) -> Vector

Finds the maximum likelihood parameter configuration given a DataModel and optionally a starting configuration. Big=true will return the value as a BigFloat. If no starting value is provided (i.e. start=false) the dimension of the parameter space is inferred automatically and the initial configuration is chosen as start=ones(dim).

using InformationGeometry, Plots; gr() # hide
DS = DataSet([1,2,3.],[4,5,6.5],[0.5,0.45,0.6])
model(x,θ) = θ[1] .* x .+ θ[2]
DM = DataModel(DS,model)
MLE = FindMLE(DM)

Depending on how the parameters $\theta$ enter into the model, the shapes of confidence regions associated with the model may be distorted. For the linearly parametrized model shown above, the $1 \sigma$ and $2 \sigma$ confidence regions form perfect hyperellipses as expected:

sols = GenerateMultipleIntervals(DM,1:2)
VisualizeSols(sols)
plot(sols[1],vars=(1,2),label="1σ CR",title="Confidence Regions for linearly parametrized model", xlabel="θ[1]", ylabel="θ[2]") # hide
plot!(sols[2],vars=(1,2),label="2σ CR") # hide
scatter!([MLE[1]],[MLE[2]],marker=:c,label="MLE") # hide
savefig("../assets/sols.svg"); nothing # hide

For a non-linearly parametrized model, the confidence regions are found to be non-ellipsoidal:

model2(x,θ) = θ[1]^3 .* x .+ exp(θ[1] + θ[2])
DM2 = DataModel(DS,model2)
sols2 = GenerateMultipleIntervals(DM2,1:2)
VisualizeSols(sols2)
#plot(sols2[1],vars=(1,2),label="1σ CR",title="Confidence Regions for non-linearly parametrized model", xlabel="θ[1]", ylabel="θ[2]") # hide
#plot!(sols2[2],vars=(1,2),label="2σ CR") # hide
#MLE2 = FindMLE(DM2);  scatter!([MLE2[1]],[MLE2[2]],marker=:c,label="MLE") # hide
#savefig("../assets/sols2.svg"); nothing # hide

Since both finding and visualizing exact confidence regions for models depending on more than two parameters (i.e. $\mathrm{dim} \, \mathcal{M} > 2$) is more challenging from a technical perspective, the above methods only work for $\mathrm{dim} \, \mathcal{M} = 2$ at this point in time. However, methods which allow for visualizations of confidence regions in arbitrary three-dimensional subspaces of parameter manifolds of any dimension are close to being finished and will follow soon.

Various geometric quantities which are intrinsic to the parameter manifold $\mathcal{M}$ can be computed as a result of the Fisher metric $g$ (and subsequent choice of the Levi-Civita connection) such as the Riemann and Ricci tensors and the Ricci scalar $R$.

InformationGeometry.FisherMetric — Method

FisherMetric(DM::DataModel, θ::Vector{<:Real})

Computes the Fisher metric $g$ given a DataModel and a parameter configuration $\theta$ under the assumption that the likelihood $L(\mathrm{data} \, | \, \theta)$ is a multivariate normal distribution.

\[g_{ab}(\theta) \coloneqq -\int_{\mathcal{D}} \mathrm{d}^m y_{\mathrm{data}} \, L(y_{\mathrm{data}} \,|\, \theta) \, \frac{\partial^2 \, \mathrm{ln}(L)}{\partial \theta^a \, \partial \theta^b} = -\mathbb{E} \bigg( \frac{\partial^2 \, \mathrm{ln}(L)}{\partial \theta^a \, \partial \theta^b} \bigg)\]

InformationGeometry.GeometricDensity — Method

GeometricDensity(DM::DataModel, θ::Vector)

Computes the square root of the determinant of the Fisher metric $\sqrt{\mathrm{det}\big(g(\theta)\big)}$ at the point $\theta$.

InformationGeometry.ChristoffelSymbol — Method

ChristoffelSymbol(DM::DataModel, θ::Vector; BigCalc::Bool=false)
ChristoffelSymbol(Metric::Function, θ::Vector; BigCalc::Bool=false)

Calculates the components of the (1,2) Christoffel symbol $\Gamma$ at a point $\theta$ (i.e. the Christoffel symbol "of the second kind") through finite differencing of the Metric. Accurate to ≈ 3e-11. BigCalc=true increases accuracy through BigFloat calculation.

InformationGeometry.Riemann — Method

Riemann(DM::DataModel, θ::Vector; BigCalc::Bool=false)
Riemann(Metric::Function, θ::Vector; BigCalc::Bool=false)

Calculates the components of the (1,3) Riemann tensor by finite differencing of the Metric. BigCalc=true increases accuracy through BigFloat calculation.

InformationGeometry.Ricci — Method

Ricci(DM::DataModel, θ::Vector; BigCalc::Bool=false)
Ricci(Metric::Function, θ::Vector; BigCalc::Bool=false)

Calculates the components of the (0,2) Ricci tensor by finite differencing of Metric. BigCalc=true increases accuracy through BigFloat calculation.

InformationGeometry.RicciScalar — Method

RicciScalar(DM::DataModel, θ::Vector; BigCalc::Bool=false)
RicciScalar(Metric::Function, θ::Vector; BigCalc::Bool=false)

Calculates the Ricci scalar by finite differencing of the Metric. BigCalc=true increases accuracy through BigFloat calculation.

Further, studying the geodesics / autoparallels on a manifold can yield enlightening insights about its geometry.

InformationGeometry.Score — Method

Score(DM::DataModel, θ::Vector{<:Real}; Auto::Bool=false)

Calculates the gradient of the log-likelihood with respect to a set of parameters p. Auto=true uses automatic differentiation.

InformationGeometry.AIC — Method

AIC(DM::DataModel, θ::Vector)

Calculates the Akaike Information Criterion given a parameter configuration $\theta$ defined by $\mathrm{AIC} = -2 \, \ell(\mathrm{data} \, | \, \theta) + 2 \, \mathrm{length}(\theta)$.

InformationGeometry.GeodesicDistance — Method

GeodesicDistance(DM::DataModel,P::Vector{<:Real},Q::Vector{<:Real}; tol::Real=1e-10)
GeodesicDistance(Metric::Function,P::Vector{<:Real},Q::Vector{<:Real}; tol::Real=1e-10)

Computes the length of a geodesic connecting the points P and Q.

To be continued...