Confidence Regions
InformationGeometry.DataSet
— TypeThe DataSet
type is a container for data points. It holds 3 vectors x
, y
, sigma
where the components of sigma
quantify the standard deviation associated with each measurement. For example,
DS = DataSet([1,2,3.],[4,5,6.5],[0.5,0.45,0.6])
Its fields can be obtained via xdata(DS)
, ydata(DS)
, sigma(DS)
.
InformationGeometry.DataModel
— TypeIn addition to a DataSet
, a DataModel
contains the model as a function model(x,θ)
and its derivative dmodel(x,θ)
where x
denotes the x-value of the data and θ
is a vector of parameters on which the model depends. Crucially, dmodel
contains the derivatives of the model with respect to the parameters θ
, not the x-values. For example
DS = DataSet([1,2,3.],[4,5,6.5],[0.5,0.45,0.6])
model(x,p::Vector) = p[1] .* x .+ p[2]
DM = DataModel(DS,model)
If provided like this, the gradient of the model with respect to the parameters p
(i.e. its "Jacobian") will be calculated using automatic differentiation. Alternatively, an explicit analytic expression for the Jacobian can be specified by hand:
function dmodel(x,p::Vector)
J = Array{Float64}(undef, length(x), length(p))
@. J[:,1] = x # ∂(model)/∂p₁
@. J[:,2] = 1. # ∂(model)/∂p₂
return J
end
DM = DataModel(DS,model,dmodel)
The output of the Jacobian must be a matrix whose columns correspond to the partial derivatives with respect to different components of p
and whose rows correspond to evaluations at different values of x
.
StatsBase.loglikelihood
— Methodloglikelihood(DM::DataModel, θ::Vector)
Calculates the logarithm of the likelihood $\ell(\mathrm{data} \, | \, \theta) \coloneqq \mathrm{ln} \big( L(\mathrm{data} \, | \, \theta) \big)$ given a DataModel
and a parameter configuration $\theta$.
InformationGeometry.FindMLE
— MethodFindMLE(DM::DataModel,start::Union{Bool,Vector}=false; Big::Bool=false, max::Int=50) -> Vector
Finds the maximum likelihood parameter configuration given a DataModel
and optionally a starting configuration. Big=true
will return the value as a BigFloat
. If no starting value is provided (i.e. start=false
) the dimension of the parameter space is inferred automatically and the initial configuration is chosen as start=ones(dim)
.
using InformationGeometry, Plots; gr() # hide
DS = DataSet([1,2,3.],[4,5,6.5],[0.5,0.45,0.6])
model(x,θ) = θ[1] .* x .+ θ[2]
DM = DataModel(DS,model)
MLE = FindMLE(DM)
Depending on how the parameters $\theta$ enter into the model, the shapes of confidence regions associated with the model may be distorted. For the linearly parametrized model shown above, the $1 \sigma$ and $2 \sigma$ confidence regions form perfect hyperellipses as expected:
sols = GenerateMultipleIntervals(DM,1:2)
VisualizeSols(sols)
plot(sols[1],vars=(1,2),label="1σ CR",title="Confidence Regions for linearly parametrized model", xlabel="θ[1]", ylabel="θ[2]") # hide
plot!(sols[2],vars=(1,2),label="2σ CR") # hide
scatter!([MLE[1]],[MLE[2]],marker=:c,label="MLE") # hide
savefig("../assets/sols.svg"); nothing # hide
For a non-linearly parametrized model, the confidence regions are found to be non-ellipsoidal:
model2(x,θ) = θ[1]^3 .* x .+ exp(θ[1] + θ[2])
DM2 = DataModel(DS,model2)
sols2 = GenerateMultipleIntervals(DM2,1:2)
VisualizeSols(sols2)
#plot(sols2[1],vars=(1,2),label="1σ CR",title="Confidence Regions for non-linearly parametrized model", xlabel="θ[1]", ylabel="θ[2]") # hide
#plot!(sols2[2],vars=(1,2),label="2σ CR") # hide
#MLE2 = FindMLE(DM2); scatter!([MLE2[1]],[MLE2[2]],marker=:c,label="MLE") # hide
#savefig("../assets/sols2.svg"); nothing # hide
Since both finding and visualizing exact confidence regions for models depending on more than two parameters (i.e. $\mathrm{dim} \, \mathcal{M} > 2$) is more challenging from a technical perspective, the above methods only work for $\mathrm{dim} \, \mathcal{M} = 2$ at this point in time. However, methods which allow for visualizations of confidence regions in arbitrary three-dimensional subspaces of parameter manifolds of any dimension are close to being finished and will follow soon.
Various geometric quantities which are intrinsic to the parameter manifold $\mathcal{M}$ can be computed as a result of the Fisher metric $g$ (and subsequent choice of the Levi-Civita connection) such as the Riemann and Ricci tensors and the Ricci scalar $R$.
InformationGeometry.FisherMetric
— MethodFisherMetric(DM::DataModel, θ::Vector{<:Real})
Computes the Fisher metric $g$ given a DataModel
and a parameter configuration $\theta$ under the assumption that the likelihood $L(\mathrm{data} \, | \, \theta)$ is a multivariate normal distribution.
InformationGeometry.GeometricDensity
— MethodGeometricDensity(DM::DataModel, θ::Vector)
Computes the square root of the determinant of the Fisher metric $\sqrt{\mathrm{det}\big(g(\theta)\big)}$ at the point $\theta$.
InformationGeometry.ChristoffelSymbol
— MethodChristoffelSymbol(DM::DataModel, θ::Vector; BigCalc::Bool=false)
ChristoffelSymbol(Metric::Function, θ::Vector; BigCalc::Bool=false)
Calculates the components of the (1,2) Christoffel symbol $\Gamma$ at a point $\theta$ (i.e. the Christoffel symbol "of the second kind") through finite differencing of the Metric
. Accurate to ≈ 3e-11. BigCalc=true
increases accuracy through BigFloat calculation.
InformationGeometry.Riemann
— MethodRiemann(DM::DataModel, θ::Vector; BigCalc::Bool=false)
Riemann(Metric::Function, θ::Vector; BigCalc::Bool=false)
Calculates the components of the (1,3) Riemann tensor by finite differencing of the Metric
. BigCalc=true
increases accuracy through BigFloat calculation.
InformationGeometry.Ricci
— MethodRicci(DM::DataModel, θ::Vector; BigCalc::Bool=false)
Ricci(Metric::Function, θ::Vector; BigCalc::Bool=false)
Calculates the components of the (0,2) Ricci tensor by finite differencing of Metric
. BigCalc=true
increases accuracy through BigFloat calculation.
InformationGeometry.RicciScalar
— MethodRicciScalar(DM::DataModel, θ::Vector; BigCalc::Bool=false)
RicciScalar(Metric::Function, θ::Vector; BigCalc::Bool=false)
Calculates the Ricci scalar by finite differencing of the Metric
. BigCalc=true
increases accuracy through BigFloat
calculation.
Further, studying the geodesics / autoparallels on a manifold can yield enlightening insights about its geometry.
InformationGeometry.Score
— MethodScore(DM::DataModel, θ::Vector{<:Real}; Auto::Bool=false)
Calculates the gradient of the log-likelihood with respect to a set of parameters p
. Auto=true
uses automatic differentiation.
InformationGeometry.AIC
— MethodAIC(DM::DataModel, θ::Vector)
Calculates the Akaike Information Criterion given a parameter configuration $\theta$ defined by $\mathrm{AIC} = -2 \, \ell(\mathrm{data} \, | \, \theta) + 2 \, \mathrm{length}(\theta)$.
InformationGeometry.GeodesicDistance
— MethodGeodesicDistance(DM::DataModel,P::Vector{<:Real},Q::Vector{<:Real}; tol::Real=1e-10)
GeodesicDistance(Metric::Function,P::Vector{<:Real},Q::Vector{<:Real}; tol::Real=1e-10)
Computes the length of a geodesic connecting the points P
and Q
.
To be continued...