EnergyStatistics.jl
In statistics distance correlation or distance covariance is a measure of dependence between two paired random vectors. The population distance correlation coefficient is zero if and only if the random vectors are independent. Thus, distance correlation measures both linear and nonlinear association between two random vectors. This is in contrast to Pearson's correlation, which can only detect linear association between two random variables. See here for references and more details.
Installation
This package can be installed using the Julia package manager. From the Julia REPL, type ]
to enter the Pkg REPL mode and run
pkg> add EnergyStatistics
General Usage
Given two vectors x
and y
the distance correlation dcor
can simply computed:
using EnergyStatistics
x = collect(-1:0.01:1)
y = map(x -> x^4 - x^2, x)
dcor(x, y) ≈ 0.374204050
These two vectors are clearly associated. However, their (Pearson) correlation coefficient vanishes suggesting that they are independent. The finite distance correlation dcor
reveals their non-linear association.
Function to compute the distance covariance dcov
and distance variance dvar
are also supplied.
Advanced Usage
The computation of a 'DistanceMatrix' is computationally expansive. Especially the computation of n(n-1)/2
pairwise distances for vectors of length n
and the subsequent centering of the distance matrix take time and memory. In cases where one wants to compute several distance correlations and keep intermediate results of the distance computations and centering one can do so. For example:
Dx = EnergyStatistics.dcenter!(EnergyStatistics.DistanceMatrix(x))
Dy = EnergyStatistics.dcenter!(EnergyStatistics.DistanceMatrix(y))
Dz = EnergyStatistics.dcenter!(EnergyStatistics.DistanceMatrix(z))
dcor_xy = dcor(Dx, Dy)
dcor_xz = dcor(Dx, Dz)
will run faster than
dcor_xy = dcor(x, y)
dcor_xz = dcor(x, z)
since the distance matrix Dx
for the vector x
is only computed once.
You can also construct distance matrices using other distance measures than the (default) abs
.
AA = EnergyStatistics.dcenter!(EnergyStatistics.DistanceMatrix(Float64, x, abs2))
Instead of double centering via dcenter!
one may also use U-centering via the ucenter!
function.
References
See the wikipedia page for references and more details.
Functions
EnergyStatistics.dcor
— Methoddcor(x::AbstractVector{T}, y::AbstractVector{T}) where T <: Real
Computes the distance correlation of samples x
and y
.
using EnergyStatistics
x = collect(-1:0.01:1)
y = @. x^4 - x^2
dcor(x, y)
# output
0.3742040504583155
EnergyStatistics.dcov
— Methoddcov(x::AbstractVector{T}, y::AbstractVector{T}) where T <: Real
Computes the distance covariance of samples x
and y
.
EnergyStatistics.dvar
— Methoddvar(x::AbstractVector{T}) where T <: Real
Computes the distance variance of a sample x
.
EnergyStatistics.DistanceMatrix
— MethodDistanceMatrix(x::AbstractVector{T}, dist = abs) where {T}
Computes the matrix of pairwise distance of x
. The distance measure dist
is abs
as default.
using EnergyStatistics
x = [1.0, 2.0]
EnergyStatistics.DistanceMatrix(x)
# output
2×2 EnergyStatistics.DistanceMatrix{Float64}:
0.0 1.0
1.0 0.0
EnergyStatistics.dcenter!
— Methoddcenter!(A::DistanceMatrix{T}) where {T <: Real}
Computes the double centered matrix of A
in place.
EnergyStatistics.ucenter!
— Methoducenter!(A::DistanceMatrix{T}) where {T <: Real}
Computes the u-centered matrix of A
in place.
EnergyStatistics.dcor
— Methoddcor(A::DistanceMatrix{T}, B::DistanceMatrix{T})
Computes the distance correlation of two centered DistanceMatrices A
and B
.
EnergyStatistics.dcov
— Methoddcov(A::DistanceMatrix{T}, B::DistanceMatrix{T}) where {T <: Real}
Computes the distance covariance of two centered DistanceMatrices A
and B
.
EnergyStatistics.dvar
— Methoddvar(A::DistanceMatrix{T}) where {T <: Real}
Computes the distance variance of a centered DistanceMatrices A
. Stores the variance alongside the DistanceMatrix for future use.