Confidence Regions

Once a DataModel object has been defined, it can subsequently be used to compute various quantities as follows:

StatsBase.loglikelihood — Method

loglikelihood(DM::DataModel, θ::AbstractVector) -> Real

Calculates the logarithm of the likelihood $L$, i.e. $\ell(\mathrm{data} \, | \, \theta) \coloneqq \mathrm{ln} \big( L(\mathrm{data} \, | \, \theta) \big)$ given a DataModel and a parameter configuration $\theta$.

InformationGeometry.MLE — Method

MLE(DM::DataModel) -> Vector

Returns the parameter configuration $\theta_\text{MLE} \in \mathcal{M}$ which is estimated to have the highest likelihood of producing the observed data (under the assumption that the specified model captures the true relationship present in the data). For performance reasons, the maximum likelihood estimate is stored as a part of the DataModel type.

InformationGeometry.LogLikeMLE — Method

LogLikeMLE(DM::DataModel) -> Real

Returns the value of the log-likelihood $\ell$ when evaluated at the maximum likelihood estimate, i.e. $\ell(\mathrm{data} \, | \, \theta_\text{MLE})$. For performance reasons, this value is stored as a part of the DataModel type.

using InformationGeometry, Plots; gr() # hide
DS = DataSet([1,2,3.], [4,5,6.5], [0.5,0.45,0.6])
model(x::Real, θ::AbstractVector{<:Real}) = θ[1] * x + θ[2]
DM = DataModel(DS, model)
MLE(DM), LogLikeMLE(DM)

One of the primary goals of InformationGeometry.jl is to enable the user to investigate the relationships between different parameters in a model in detail by determining and visualizing the exact confidence regions associated with the best fit parameters. In this context, exact refers to the fact that no simplifying assumptions are made about the shape of the confidence regions.

Depending on how the parameters $\theta$ enter into the model, the shapes of confidence regions associated with the model may be distorted. For the linearly parametrized model shown above, the $1 \sigma$ and $2 \sigma$ confidence regions form perfect ellipses around the maximum likelihood estimate as expected:

sols = ConfidenceRegions(DM, 1:2; tol=1e-9)
VisualizeSols(sols)
#plot(sols[1],vars=(1,2),label="1σ CR",title="Confidence Regions for linearly parametrized model", xlabel="θ[1]", ylabel="θ[2]") # hide
#plot!(sols[2],vars=(1,2),label="2σ CR") # hide
#scatter!([MLE(DM)[1]],[MLE(DM)[2]],marker=:c,label="MLE") # hide
#savefig("../assets/sols.svg"); nothing # hide

For a non-linearly parametrized model, the confidence regions are no longer ellipsoidal:

model2(x::Real, θ::AbstractVector{<:Real}) = θ[1]^3 * x + exp(θ[1] + θ[2])
DM2 = DataModel(DS, model2)
sols2 = ConfidenceRegions(DM2, 1:2; tol=1e-9)
VisualizeSols(sols2)
#plot(sols2[1],vars=(1,2),label="1σ CR",title="Confidence Regions for non-linearly parametrized model", xlabel="θ[1]", ylabel="θ[2]") # hide
#plot!(sols2[2],vars=(1,2),label="2σ CR") # hide
#scatter!([MLE(DM2)[1]],[MLE(DM2)[2]],marker=:c,label="MLE") # hide
#savefig("../assets/sols2.svg"); nothing # hide

Specifically in the case of two-dimensional parameter spaces as shown here, the problem of finding the exact boundaries of the confidence regions is turned into a system of ordinary differential equations and subsequently solved using the DifferentialEquations.jl suite. As a result, the boundaries of the confidence regions are obtained in the form of ODESolution objects, which come equipped with elaborate interpolation methods.

InformationGeometry.ConfidenceRegions — Method

ConfidenceRegions(DM::DataModel, Range::Union{AbstractRange,AbstractVector})

Computes the boundaries of confidence regions for two-dimensional parameter spaces given a vector or range of confidence levels. A convenient interface which extends this to higher dimensions is currently still under development.

For example,

ConfidenceRegions(DM, 1:3; tol=1e-9)

computes the $1\sigma$, $2\sigma$ and $3\sigma$ confidence regions associated with a given DataModel using a solver tolerance of $10^{-9}$.

Keyword arguments:

IsConfVol = true can be used to specify the desired confidence level directly in terms of a probability $p \in [0,1]$ instead of in units of standard deviations $\sigma$,
tol can be used to quantify the tolerance with which the ODE which defines the confidence boundary is solved (default tol = 1e-12),
meth can be used to specify the solver algorithm (default meth = Tsit5()),
Auto is a boolean which controls whether the derivatives of the likelihood are computed using automatic differentiation or an analytic expression involving DM.dmodel (default Auto = false).

Since both finding and visualizing exact confidence regions for models depending on more than two parameters (i.e. $\mathrm{dim} \, \mathcal{M} > 2$) is more challenging from a technical perspective, the above methods only work for $\mathrm{dim} \, \mathcal{M} = 2$ at this point in time. However, methods which allow for visualizations of confidence regions in arbitrary three-dimensional subspaces of parameter manifolds of any dimension are close to being finished and will follow soon.

Various geometric quantities which are intrinsic to the parameter manifold $\mathcal{M}$ can be computed as a result of the Fisher metric $g$ (and subsequent choice of the Levi-Civita connection) such as the Riemann and Ricci tensors and the Ricci scalar $R$.

InformationGeometry.FisherMetric — Method

FisherMetric(DM::DataModel, θ::AbstractVector{<:Number})

Computes the Fisher metric $g$ given a DataModel and a parameter configuration $\theta$ under the assumption that the likelihood $L(\mathrm{data} \, | \, \theta)$ is a multivariate normal distribution.

\[g_{ab}(\theta) \coloneqq -\int_{\mathcal{D}} \mathrm{d}^m y_{\mathrm{data}} \, L(y_{\mathrm{data}} \,|\, \theta) \, \frac{\partial^2 \, \mathrm{ln}(L)}{\partial \theta^a \, \partial \theta^b} = -\mathbb{E} \bigg( \frac{\partial^2 \, \mathrm{ln}(L)}{\partial \theta^a \, \partial \theta^b} \bigg)\]

InformationGeometry.GeometricDensity — Method

GeometricDensity(DM::DataModel, θ::AbstractVector) -> Real

Computes the square root of the determinant of the Fisher metric $\sqrt{\mathrm{det}\big(g(\theta)\big)}$ at the point $\theta$.

InformationGeometry.ChristoffelSymbol — Method

ChristoffelSymbol(DM::DataModel, θ::AbstractVector; BigCalc::Bool=false)
ChristoffelSymbol(Metric::Function, θ::AbstractVector; BigCalc::Bool=false)

Calculates the components of the $(1,2)$ Christoffel symbol $\Gamma$ at a point $\theta$ (i.e. the Christoffel symbol "of the second kind") through finite differencing of the Metric. Accurate to ≈ 3e-11. BigCalc=true increases accuracy through BigFloat calculation.

InformationGeometry.Riemann — Method

Riemann(DM::DataModel, θ::AbstractVector; BigCalc::Bool=false)
Riemann(Metric::Function, θ::AbstractVector; BigCalc::Bool=false)

Calculates the components of the $(1,3)$ Riemann tensor by finite differencing of the Metric. BigCalc=true increases accuracy through BigFloat calculation.

InformationGeometry.Ricci — Method

Ricci(DM::DataModel, θ::AbstractVector; BigCalc::Bool=false)
Ricci(Metric::Function, θ::AbstractVector; BigCalc::Bool=false)

Calculates the components of the $(0,2)$ Ricci tensor by finite differencing of the Metric. BigCalc=true increases accuracy through BigFloat calculation.

InformationGeometry.RicciScalar — Method

RicciScalar(DM::DataModel, θ::AbstractVector; BigCalc::Bool=false) -> Real
RicciScalar(Metric::Function, θ::AbstractVector; BigCalc::Bool=false) -<> Real

Calculates the Ricci scalar by finite differencing of the Metric. BigCalc=true increases accuracy through BigFloat calculation.

Further, studying the geodesics associated with a metric manifold can yield valuable insights into its geometry.

InformationGeometry.Score — Method

Score(DM::DataModel, θ::AbstractVector{<:Number}; Auto::Bool=false)

Calculates the gradient of the log-likelihood $\ell$ with respect to a set of parameters $\theta$. Auto=true uses automatic differentiation.

InformationGeometry.AIC — Method

AIC(DM::DataModel, θ::AbstractVector) -> Real

Calculates the Akaike Information Criterion given a parameter configuration $\theta$ defined by $\mathrm{AIC} = 2 \, \mathrm{length}(\theta) -2 \, \ell(\mathrm{data} \, | \, \theta)$. Lower values for the AIC indicate that the associated model function is more likely to be correct. For linearly parametrized models and small sample sizes, it is advisable to instead use the AICc which is more accurate.

InformationGeometry.AICc — Method

AICc(DM::DataModel, θ::AbstractVector) -> Real

Computes Akaike Information Criterion with an added correction term that prevents the AIC from selecting models with too many parameters (i.e. overfitting) in the case of small sample sizes. $\mathrm{AICc} = \mathrm{AIC} + \frac{2\mathrm{length}(\theta)^2 + 2 \mathrm{length}(\theta)}{N - \mathrm{length}(\theta) - 1}$ where $N$ is the number of data points. Whereas AIC constitutes a first order estimate of the information loss, the AICc constitutes a second order estimate. However, this particular correction term assumes that the model is linearly parametrized.

InformationGeometry.BIC — Method

BIC(DM::DataModel, θ::AbstractVector) -> Real

Calculates the Bayesian Information Criterion given a parameter configuration $\theta$ defined by $\mathrm{BIC} = \mathrm{ln}(N) \cdot \mathrm{length}(\theta) -2 \, \ell(\mathrm{data} \, | \, \theta)$ where $N$ is the number of data points.

InformationGeometry.GeodesicDistance — Method

GeodesicDistance(DM::DataModel,P::AbstractVector{<:Real},Q::AbstractVector{<:Real}; tol::Real=1e-10)
GeodesicDistance(Metric::Function,P::AbstractVector{<:Real},Q::AbstractVector{<:Real}; tol::Real=1e-10)

Computes the length of a geodesic connecting the points P and Q.