CausalityTools.CausalityTools
— ModuleCausalityTools
CausalityTools.jl is a package for quantifying associations, independence testing and causal inference.
All further information is provided in the documentation, which you can either find online or build locally by running the docs/make.jl
file.
Key features
- Association API: includes measures and their estimators for pairwise, conditional and other forms of association from conventional statistics, from dynamical systems theory, and from information theory: partial correlation, distance correlation, (conditional) mutual information, transfer entropy, convergent cross mapping and a lot more!
- Independence testing API, which is automatically compatible with every association measure estimator implemented in the package.
- Causal (network) inference API integrating the association measures and independence testing framework.
Addititional features
Extending on features from ComplexityMeasures.jl, we also offer
- Discretization API for multiple (multivariate) input datasets.
- Multivariate counting and probability estimation API.
- Multivariate information measure API
Installation
To install the package, run import Pkg; Pkg.add("CausalityTools")
.
CausalityTools.Amplitude
— TypeAmplitude <: InstantaneousSignalProperty
Indicates that the instantaneous amplitudes of a signal should be used.
CausalityTools.AssociationMeasure
— TypeAssociationMeasure
The supertype of all association measures.
Abstract implementations
Currently, the association measures are classified by abstract classes listed below. These abstract classes offer common functionality among association measures that are conceptually similar. This makes maintenance and framework extension easier than if each measure was implemented "in isolation".
Concrete implementations
Concrete subtypes are given as input to association
. Many of these types require an AssociationMeasureEstimator
to compute.
Type | AssociationMeasure | Pairwise | Conditional |
---|---|---|---|
Correlation | PearsonCorrelation | ✓ | ✖ |
Correlation | DistanceCorrelation | ✓ | ✓ |
Closeness | SMeasure | ✓ | ✖ |
Closeness | HMeasure | ✓ | ✖ |
Closeness | MMeasure | ✓ | ✖ |
Closeness (ranks) | LMeasure | ✓ | ✖ |
Closeness | JointDistanceDistribution | ✓ | ✖ |
Cross-mapping | PairwiseAsymmetricInference | ✓ | ✖ |
Cross-mapping | ConvergentCrossMapping | ✓ | ✖ |
Conditional recurrence | MCR | ✓ | ✖ |
Conditional recurrence | RMCD | ✓ | ✓ |
Shared information | MIShannon | ✓ | ✖ |
Shared information | MIRenyiJizba | ✓ | ✖ |
Shared information | MIRenyiSarbu | ✓ | ✖ |
Shared information | MITsallisFuruichi | ✓ | ✖ |
Shared information | PartialCorrelation | ✖ | ✓ |
Shared information | CMIShannon | ✖ | ✓ |
Shared information | CMIRenyiSarbu | ✖ | ✓ |
Shared information | CMIRenyiJizba | ✖ | ✓ |
Shared information | CMIRenyiPoczos | ✖ | ✓ |
Shared information | CMITsallisPapapetrou | ✖ | ✓ |
Information transfer | TEShannon | ✓ | ✓ |
Information transfer | TERenyiJizba | ✓ | ✓ |
Partial mutual information | PartialMutualInformation | ✖ | ✓ |
Information measure | JointEntropyShannon | ✓ | ✖ |
Information measure | JointEntropyRenyi | ✓ | ✖ |
Information measure | JointEntropyTsallis | ✓ | ✖ |
Information measure | ConditionalEntropyShannon | ✓ | ✖ |
Information measure | ConditionalEntropyTsallisAbe | ✓ | ✖ |
Information measure | ConditionalEntropyTsallisFuruichi | ✓ | ✖ |
Divergence | HellingerDistance | ✓ | ✖ |
Divergence | KLDivergence | ✓ | ✖ |
Divergence | RenyiDivergence | ✓ | ✖ |
Divergence | VariationDistance | ✓ | ✖ |
CausalityTools.AssociationMeasureEstimator
— TypeAssociationMeasureEstimator
The supertype of all association measure estimators.
Concrete subtypes are given as input to association
.
Abstract subtypes
Concrete implementations
CausalityTools.BivariateInformationMeasure
— TypeBivariateInformationMeasure <: MultivariateInformationMeasure
The supertype of all bivariate information measure definitions.
CausalityTools.CMIDecomposition
— TypeCMIDecomposition(definition::MultivariateInformationMeasure,
est::ConditionalMutualInformationEstimator)
Estimate some multivariate information measure specified by definition
, by decomposing it into a combination of conditional mutual information terms.
Usage
- Use with
association
to compute aMultivariateInformationMeasure
from input data:association
(est::CMIDecomposition, x...)
. - Use with some
IndependenceTest
to test for independence between variables.
Description
Each of the conditional mutual information terms are estimated using est
, which can be any ConditionalMutualInformationEstimator
. Finally, these estimates are combined according to the relevant decomposition formula.
This estimator is similar to EntropyDecomposition
, but definition
is expressed as conditional mutual information terms instead of entropy terms.
Examples
- Example 1: Estimating
TEShannon
by decomposing it intoCMIShannon
which is estimated using theFPVP
estimator.
See also: ConditionalMutualInformationEstimator
, MultivariateInformationMeasureEstimator
, MultivariateInformationMeasure
.
CausalityTools.CMIRenyiJizba
— TypeCMIRenyiJizba <: ConditionalMutualInformation
CMIRenyiJizba(; base = 2, q = 1.5)
The Rényi conditional mutual information $I_q^{R_{J}}(X; Y | Z)$ defined in Jizba2012.
Usage
- Use with
association
to compute the raw Rényi-Jizba conditional mutual information using of of the estimators listed below. - Use with
independence
to perform a formal hypothesis test for pairwise conditional independence using the Rényi-Jizba conditional mutual information.
Compatible estimators
Definition
\[I_q^{R_{J}}(X; Y | Z) = I_q^{R_{J}}(X; Y, Z) - I_q^{R_{J}}(X; Z),\]
where $I_q^{R_{J}}(X; Z)$ is the MIRenyiJizba
mutual information.
Estimation
- Example 1:
JointProbabilities
withBubbleSortSwaps
outcome space. - Example 2:
EntropyDecomposition
withOrdinalPatterns
outcome space. - Example 3:
EntropyDecomposition
with differential entropy estimatorLeonenkoProzantoSavani
.
CausalityTools.CMIRenyiPoczos
— TypeCMIRenyiPoczos <: ConditionalMutualInformation
CMIRenyiPoczos(; base = 2, q = 1.5)
The differential Rényi conditional mutual information $I_q^{R_{P}}(X; Y | Z)$ defined in Poczos2012.
Usage
- Use with
association
to compute the raw Rényi-Poczos conditional mutual information using of of the estimators listed below. - Use with
independence
to perform a formal hypothesis test for pairwise conditional independence using the Rényi-Poczos conditional mutual information.
Compatible estimators
Definition
\[\begin{align*} I_q^{R_{P}}(X; Y | Z) &= \dfrac{1}{q-1} \int \int \int \dfrac{p_Z(z) p_{X, Y | Z}^q}{( p_{X|Z}(x|z) p_{Y|Z}(y|z) )^{q-1}} \\ &= \mathbb{E}_{X, Y, Z} \sim p_{X, Y, Z} \left[ \dfrac{p_{X, Z}^{1-q}(X, Z) p_{Y, Z}^{1-q}(Y, Z) }{p_{X, Y, Z}^{1-q}(X, Y, Z) p_Z^{1-q}(Z)} \right] \end{align*}\]
Estimation
- Example 1: Dedicated
PoczosSchneiderCMI
estimator.
CausalityTools.CMIRenyiSarbu
— TypeCMIRenyiSarbu <: ConditionalMutualInformation
CMIRenyiSarbu(; base = 2, q = 1.5)
The Rényi conditional mutual information from Sarbu2014.
Usage
- Use with
association
to compute the raw Rényi-Sarbu conditional mutual information using of of the estimators listed below. - Use with
independence
to perform a formal hypothesis test for pairwise conditional independence using the Rényi-Sarbu conditional mutual information.
Compatible estimators
Discrete description
Assume we observe three discrete random variables $X$, $Y$ and $Z$. Sarbu (2014) defines discrete conditional Rényi mutual information as the conditional Rényi $\alpha$-divergence between the conditional joint probability mass function $p(x, y | z)$ and the product of the conditional marginals, $p(x |z) \cdot p(y|z)$:
\[I(X, Y; Z)^R_q = \dfrac{1}{q-1} \sum_{z \in Z} p(Z = z) \log \left( \sum_{x \in X}\sum_{y \in Y} \dfrac{p(x, y|z)^q}{\left( p(x|z)\cdot p(y|z) \right)^{q-1}} \right)\]
CausalityTools.CMIShannon
— TypeCMIShannon <: ConditionalMutualInformation
CMIShannon(; base = 2)
The Shannon conditional mutual information (CMI) $I^S(X; Y | Z)$.
Usage
- Use with
association
to compute the raw Shannon conditional mutual information using of of the estimators listed below. - Use with
independence
to perform a formal hypothesis test for pairwise conditional independence using the Shannon conditional mutual information.
Compatible estimators
JointProbabilities
EntropyDecomposition
MIDecomposition
FPVP
MesnerShalizi
Rahimzamani
PoczosSchneiderCMI
GaussianCMI
Supported definitions
Consider random variables $X \in \mathbb{R}^{d_X}$ and $Y \in \mathbb{R}^{d_Y}$, given $Z \in \mathbb{R}^{d_Z}$. The Shannon conditional mutual information is defined as
\[\begin{align*} I(X; Y | Z) &= H^S(X, Z) + H^S(Y, z) - H^S(X, Y, Z) - H^S(Z) \\ &= I^S(X; Y, Z) + I^S(X; Y) \end{align*},\]
where $I^S(\cdot; \cdot)$ is the Shannon mutual information MIShannon
, and $H^S(\cdot)$ is the Shannon
entropy.
Differential Shannon CMI is obtained by replacing the entropies by differential entropies.
Estimation
- Example 1:
EntropyDecomposition
withKraskov
estimator. - Example 2:
EntropyDecomposition
withValueBinning
estimator. - Example 3:
MIDecomposition
withKraskovStögbauerGrassberger1
estimator.
CausalityTools.CMITsallisPapapetrou
— TypeCMITsallisPapapetrou <: ConditionalMutualInformation
CMITsallisPapapetrou(; base = 2, q = 1.5)
The Tsallis-Papapetrou conditional mutual information Papapetrou2020.
Usage
- Use with
association
to compute the raw Tsallis-Papapetrou conditional mutual information using of of the estimators listed below. - Use with
independence
to perform a formal hypothesis test for pairwise conditional independence using the Tsallis-Papapetrou conditional mutual information.
Compatible estimators
Definition
Tsallis-Papapetrou conditional mutual information is defined as
\[I_T^q(X, Y \mid Z) = \frac{1}{1 - q} \left( 1 - \sum_{XYZ} \frac{p(x, y, z)^q}{p(x \mid z)^{q-1} p(y \mid z)^{q-1} p(z)^{q-1}} \right).\]
CausalityTools.ClosenessMeasure
— TypeClosenessMeasure <: AssociationMeasure
The supertype for all multivariate information-based measure definitions.
Implementations
CausalityTools.CodifyPoints
— TypeCodifyPoints{N}
CodifyPoints(encodings::NTuple{N, Encoding})
CodifyPoints
points is a Discretization
scheme that encodes input data points without applying any sequential transformation to the input (as opposed to CodifyVariables
, which may apply some transformation before encoding).
Usage
- Use with
codify
to encode/discretize input variable on a point-by-point basis.
Compatible encodings
GaussianCDFEncoding
OrdinalPatternEncoding
RelativeMeanEncoding
RelativeFirstDifferenceEncoding
UniqueElementsEncoding
RectangularBinEncoding
CombinationEncoding
Description
Given x::AbstractStateSpaceSet...
, where the i
-th dataset is assumed to represent a single series of measurements, CodifyPoints
encodes each point pₖ ∈ x[i]
using some Encoding
(s), without applying any (sequential) transformation to the x[i]
first. This behaviour is different to CodifyVariables
, which does apply a transformation to x[i]
before encoding.
If length(x) == N
(i.e. there are N
input dataset), then encodings
must be a tuple of N
Encoding
. Alternatively, if encodings
is a single Encoding
, then that same encoding is applied to every x[i]
.
Examples
using CausalityTools
# The same encoding on two input datasets
x = StateSpaceSet(rand(100, 3))
y = StateSpaceSet(rand(100, 3))
encoding_ord = OrdinalPatternEncoding(3)
cx, cy = codify(CodifyPoints(encoding_ord), x, y)
# Different encodings on multiple datasets
z = StateSpaceSet(rand(100, 2))
encoding_bin = RectangularBinEncoding(RectangularBinning(3), z)
d = CodifyPoints(encoding_ord, encoding_ord, encoding_bin)
cx, cy, cz = codify(d, x, y, z)
CausalityTools.CodifyVariables
— TypeCodifyVariables <: Discretization
CodifyVariables(outcome_space::OutcomeSpace)
The CodifyVariables
discretization scheme quantises input data in a column-wise manner using the given outcome_space
.
Compatible outcome spaces
UniqueElements
(for when data are pre-discretized)BubbleSortSwaps
CosineSimilarityBinning
OrdinalPatterns
Dispersion
Description
The main difference between CodifyVariables
and [CodifyPoints
] is that the former uses OutcomeSpace
s for discretization. This usually means that some transformation is applied to the data before discretizing. For example, some outcome constructs a delay embedding from the input (and thus encodes sequential information) before encoding the data.
Specifically, given x::AbstractStateSpaceSet...
, where the i
-th dataset x[i]
is assumed to represent a single series of measurements, CodifyVariables
encodes x[i]
by codify
-ing into a series of integers using an appropriate OutcomeSpace
. This is typically done by first sequentially transforming the data and then running sliding window (the width of the window is controlled by outcome_space
) across the data, and then encoding the values within each window to an integer.
Examples
using CausalityTools
x, y = rand(100), rand(100)
d = CodifyVariables(OrdinalPatterns(m=2))
cx, cy = codify(d, x, y)
CausalityTools.ConditionalEntropy
— TypeConditionalEntropy <: MultivariateInformationMeasure
The supertype for all conditional entropy measures.
Concrete subtypes
CausalityTools.ConditionalEntropyShannon
— TypeConditionalEntropyShannon <: ConditionalEntropy
ConditionalEntropyShannon(; base = 2)
The Shannon
conditional entropy measure.
Usage
- Use with
association
to compute the Shannon conditional entropy between two variables.
Compatible estimators
Discrete definition
Sum formulation
The conditional entropy between discrete random variables $X$ and $Y$ with finite ranges $\mathcal{X}$ and $\mathcal{Y}$ is defined as
\[H^{S}(X | Y) = -\sum_{x \in \mathcal{X}, y \in \mathcal{Y}} p(x, y) \log(p(x | y)).\]
This is the definition used when calling association
with a JointProbabilities
estimator.
Two-entropies formulation
Equivalently, the following differenConditionalEntropy of entropies hold
\[H^S(X | Y) = H^S(X, Y) - H^S(Y),\]
where $H^S(\cdot)$ and $H^S(\cdot | \cdot)$ are the Shannon
entropy and Shannon joint entropy, respectively. This is the definition used when calling association
with a ProbabilitiesEstimator
.
Differential definition
The differential conditional Shannon entropy is analogously defined as
\[H^S(X | Y) = h^S(X, Y) - h^S(Y),\]
where $h^S(\cdot)$ and $h^S(\cdot | \cdot)$ are the Shannon
differential entropy and Shannon joint differential entropy, respectively. This is the definition used when calling association
with a DifferentialInfoEstimator
.
Estimation
- Example 1: Analytical example from Cover & Thomas's book.
- Example 2:
JointProbabilities
estimator withCodifyVariables
discretization andUniqueElements
outcome space on categorical data. - Example 3:
JointProbabilities
estimator withCodifyPoints
discretization andUniqueElementsEncoding
encoding of points on numerical data.
CausalityTools.ConditionalEntropyTsallisAbe
— TypeConditionalEntropyTsallisAbe <: ConditionalEntropy
ConditionalEntropyTsallisAbe(; base = 2, q = 1.5)
Abe2001's discrete Tsallis conditional entropy measure.
Usage
- Use with
association
to compute the Tsallis-Abe conditional entropy between two variables.
Compatible estimators
Definition
Abe & Rajagopal's Tsallis conditional entropy between discrete random variables $X$ and $Y$ with finite ranges $\mathcal{X}$ and $\mathcal{Y}$ is defined as
\[H_q^{T_A}(X | Y) = \dfrac{H_q^T(X, Y) - H_q^T(Y)}{1 + (1-q)H_q^T(Y)},\]
where $H_q^T(\cdot)$ and $H_q^T(\cdot, \cdot)$ is the Tsallis
entropy and the joint Tsallis entropy.
Estimation
- Example 1:
JointProbabilities
estimator withCodifyVariables
discretization andUniqueElements
outcome space on categorical data. - Example 2:
JointProbabilities
estimator withCodifyPoints
discretization andUniqueElementsEncoding
encoding of points on numerical data.
CausalityTools.ConditionalEntropyTsallisFuruichi
— TypeConditionalEntropyTsallisFuruichi <: ConditionalEntropy
ConditionalEntropyTsallisFuruichi(; base = 2, q = 1.5)
Furuichi (2006)'s discrete Tsallis conditional entropy definition.
Usage
- Use with
association
to compute the Tsallis-Furuichi conditional entropy between two variables.
Compatible estimators
Definition
Furuichi's Tsallis conditional entropy between discrete random variables $X$ and $Y$ with finite ranges $\mathcal{X}$ and $\mathcal{Y}$ is defined as
\[H_q^T(X | Y) = -\sum_{x \in \mathcal{X}, y \in \mathcal{Y}} p(x, y)^q \log_q(p(x | y)),\]
$\ln_q(x) = \frac{x^{1-q} - 1}{1 - q}$ and $q \neq 1$. For $q = 1$, $H_q^T(X | Y)$ reduces to the Shannon conditional entropy:
\[H_{q=1}^T(X | Y) = -\sum_{x \in \mathcal{X}, y \in \mathcal{Y}} = p(x, y) \log(p(x | y))\]
If any of the entries of the marginal distribution for Y
are zero, or the q-logarithm is undefined for a particular value, then the measure is undefined and NaN
is returned.
Estimation
- Example 1:
JointProbabilities
estimator withCodifyVariables
discretization andUniqueElements
outcome space on categorical data. - Example 2:
JointProbabilities
estimator withCodifyPoints
discretization andUniqueElementsEncoding
encoding of points on numerical data.
CausalityTools.ConditionalMutualInformation
— TypeCondiitionalMutualInformation
Abstract type for all mutual information measures.
Concrete implementations
See also: ConditionalMutualInformationEstimator
CausalityTools.ConditionalMutualInformationEstimator
— TypeConditionalMutualInformationEstimator
The supertype for dedicated ConditionalMutualInformation
estimators.
Concrete implementations
CausalityTools.ConvergentCrossMapping
— TypeConvergentCrossMapping <: CrossmapMeasure
ConvergentCrossMapping(; d::Int = 2, τ::Int = -1, w::Int = 0,
f = Statistics.cor, embed_warn = true)
The convergent cross mapping measure Sugihara2012.
Usage
- Use with
association
together with aCrossmapEstimator
to compute the cross-map correlation between input variables.
Compatible estimators
Description
The Theiler window w
controls how many temporal neighbors are excluded during neighbor searches (w = 0
means that only the point itself is excluded). f
is a function that computes the agreement between observations and predictions (the default, f = Statistics.cor
, gives the Pearson correlation coefficient).
Embedding
Let S(i)
be the source time series variable and T(i)
be the target time series variable. This version produces regular embeddings with fixed dimension d
and embedding lag τ
as follows:
\[( S(i), S(i+\tau), S(i+2\tau), \ldots, S(i+(d-1)\tau, T(i))_{i=1}^{N-(d-1)\tau}.\]
In this joint embedding, neighbor searches are performed in the subspace spanned by the first D-1
variables, while the last (D
-th) variable is to be predicted.
With this convention, τ < 0
implies "past/present values of source used to predict target", and τ > 0
implies "future/present values of source used to predict target". The latter case may not be meaningful for many applications, so by default, a warning will be given if τ > 0
(embed_warn = false
turns off warnings).
Estimation
- Example 1. Estimation with
RandomVectors
estimator. - Example 2. Estimation with
RandomSegment
estimator. - Example 3: Reproducing figures from Sugihara2012.
CausalityTools.CorrTest
— TypeCorrTest <: IndependenceTest
CorrTest()
An independence test based correlation (for two variables) and partial correlation (for three variables) Levy1978; as described in Schmidt2018.
Uses PearsonCorrelation
and PartialCorrelation
internally.
Assumes that the input data are (multivariate) normally distributed. Then ρ(X, Y) = 0
implies X ⫫ Y
and ρ(X, Y | 𝐙) = 0
implies X ⫫ Y | 𝐙
.
Description
The null hypothesis is H₀ := ρ(X, Y | 𝐙) = 0
. We use the approach in Levy & Narula (1978)Levy1978 and compute the Z-transformation of the observed (partial) correlation coefficient $\hat{\rho}_{XY|\bf{Z}}$:
\[Z(\hat{\rho}_{XY|\bf{Z}}) = \log\dfrac{1 + \hat{\rho}_{XY|\bf{Z}}}{1 - \hat{\rho}_{XY|\bf{Z}}}.\]
To test the null hypothesis against the alternative hypothesis H₁ := ρ(X, Y | 𝐙) > 0
, calculate
\[\hat{Z} = \dfrac{1}{2}\dfrac{Z(\hat{\rho}_{XY|\bf{Z}}) - Z(0)}{\sqrt{1/(n - d - 3)}},\]
and compute the two-sided p-value (Schmidt et al., 2018)
\[p(X, Y | \bf{Z}) = 2(1 - \phi(\sqrt{n - d - 3}Z(\hat{\rho}_{XY|\bf{Z}}))),\]
where $d$ is the dimension of $\bf{Z}$ and $n$ is the number of samples. For the pairwise case, the procedure is identical, but set $\bf{Z} = \emptyset$.
Examples
- Example 1. Pairwise and conditional tests for independence on coupled noise processes.
CausalityTools.CorrTestResult
— TypeCorrTestResult(pvalue, ρ, z)
A simple struct that holds the results of a CorrTest
test: the (partial) correlation coefficient ρ
, Fisher's z
, and pvalue
- the two-sided p-value for the test.
CausalityTools.CorrelationMeasure
— TypeCorrelationMeasure <: AssociationMeasure end
The supertype for correlation measures.
Concrete implementations
CausalityTools.CrossmapEstimator
— TypeCrossmapEstimator{M<:CrossmapMeasure, LIBSIZES, RNG}
The abstract supertype for all cross-map estimators.
Concrete subtypes
Description
Because the type of the library may differ between estimators, and because RNGs from different packages may be used, subtypes must implement the LIBSIZES
and RNG
type parameters.
For efficiency purposes, subtypes may contain mutable containers that can be re-used for ensemble analysis (see Ensemble
).
A cross-map estimator uses the concept of "libraries". A library is essentially just a reference to a set of points, and usually, a library refers to indices of points, not the actual points themselves.
For example, for timeseries, RandomVectors(libsizes = 50:25:100)
produces three separate libraries, where the first contains 50 randomly selected time indices, the second contains 75 randomly selected time indices, and the third contains 100 randomly selected time indices. This of course assumes that all quantities involved can be indexed using the same time indices, meaning that the concept of "library" only makes sense after relevant quantities have been jointly embedded, so that they can be jointly indexed. For non-instantaneous prediction, the maximum possible library size shrinks with the magnitude of the index/time-offset for the prediction.
For spatial analyses (not yet implemented), indices could be more complex and involve multi-indices.
CausalityTools.CrossmapMeasure
— TypeCrossmapMeasure <: AssociationMeasure
The supertype for all cross-map measures. Concrete subtypes are
ConvergentCrossMapping
, orCCM
for short.PairwiseAsymmetricInference
, orPAI
for short.
See also: CrossmapEstimator
.
CausalityTools.Discretization
— TypeDiscretization
The supertype of all discretization schemes.
Concrete implementations
CausalityTools.DistanceCorrelation
— TypeDistanceCorrelation
The distance correlation Szekely2007 measure quantifies potentially nonlinear associations between pairs of variables. If applied to three variables, the partial distance correlation Szekely2014 is computed.
Usage
- Use with
association
to compute the raw (partial) distance correlation coefficient. - Use with
independence
to perform a formal hypothesis test for pairwise dependence.
Description
The distance correlation can be used to compute the association between two variables, or the conditional association between three variables, like so:
association(DistanceCorrelation(), x, y) → dcor ∈ [0, 1]
association(DistanceCorrelation(), x, y, z) → pdcor
With two variable, we comptue dcor
, which is called the empirical/sample distance correlation Szekely2007. With three variables, the partial distance correlation pdcor
is computed Szekely2014.
A partial distance correlation distance_correlation(X, Y, Z) = 0
doesn't always guarantee conditional independence X ⫫ Y | Z
. Szekely2014 for an in-depth discussion.
CausalityTools.DivergenceOrDistance
— TypeDivergenceOrDistance <: BivariateInformationMeasure
The supertype for bivariate information measures aiming to quantify some sort of divergence, distance or closeness between two probability distributions.
Some of these measures are proper metrics, while others are not, but they have in common that they aim to quantify how "far from each other" two probabilities distributions are.
Concrete implementations
CausalityTools.EmbeddingTE
— TypeEmbeddingTE(; dS = 1, dT = 1, dTf = 1, dC = 1, τS = -1, τT = -1, ηTf = 1, τC = -1)
EmbeddingTE(opt::OptimiseTraditional, s, t, [c])
EmbeddingTE
provide embedding parameters for transfer entropy analysis using either TEShannon
, TERenyiJizba
, or in general any subtype of TransferEntropy
.
The second method finds parameters using the "traditional" optimised embedding techniques from DynamicalSystems.jl
Convention for generalized delay reconstruction
We use the following convention. Let $s(i)$ be time series for the source variable, $t(i)$ be the time series for the target variable and $c(i)$ the time series for the conditional variable. To compute transfer entropy, we need the following marginals:
\[\begin{aligned} T^{+} &= \{t(i+\eta^1), t(i+\eta^2), \ldots, (t(i+\eta^{d_{T^{+}}}) \} \\ T^{-} &= \{ (t(i+\tau^0_{T}), t(i+\tau^1_{T}), t(i+\tau^2_{T}), \ldots, t(t + \tau^{d_{T} - 1}_{T})) \} \\ S^{-} &= \{ (s(i+\tau^0_{S}), s(i+\tau^1_{S}), s(i+\tau^2_{S}), \ldots, s(t + \tau^{d_{S} - 1}_{S})) \} \\ C^{-} &= \{ (c(i+\tau^0_{C}), c(i+\tau^1_{C}), c(i+\tau^2_{C}), \ldots, c(t + \tau^{d_{C} - 1}_{C})) \} \end{aligned}\]
Depending on the application, the delay reconstruction lags $\tau^k_{T} \leq 0$, $\tau^k_{S} \leq 0$, and $\tau^k_{C} \leq 0$ may be equally spaced, or non-equally spaced. The same applied to the prediction lag(s), but typically only a only a single predictions lag $\eta^k$ is used (so that $d_{T^{+}} = 1$).
For transfer entropy, traditionally at least one $\tau^k_{T}$, one $\tau^k_{S}$ and one $\tau^k_{C}$ equals zero. This way, the $T^{-}$, $S^{-}$ and $C^{-}$ marginals always contains present/past states, while the $\mathcal T$ marginal contain future states relative to the other marginals. However, this is not a strict requirement, and modern approaches that searches for optimal embeddings can return embeddings without the intantaneous lag.
Combined, we get the generalized delay reconstruction $\mathbb{E} = (T^{+}_{(d_{T^{+}})}, T^{-}_{(d_{T})}, S^{-}_{(d_{S})}, C^{-}_{(d_{C})})$. Transfer entropy is then computed as
\[\begin{aligned} TE_{S \rightarrow T | C} = \int_{\mathbb{E}} P(T^{+}, T^-, S^-, C^-) \log_{b}{\left(\frac{P(T^{+} | T^-, S^-, C^-)}{P(T^{+} | T^-, C^-)}\right)}, \end{aligned}\]
or, if conditionals are not relevant,
\[\begin{aligned} TE_{S \rightarrow T} = \int_{\mathbb{E}} P(T^{+}, T^-, S^-) \log_{b}{\left(\frac{P(T^{+} | T^-, S^-)}{P(T^{+} | T^-)}\right)}, \end{aligned}\]
Here,
- $T^{+}$ denotes the $d_{T^{+}}$-dimensional set of vectors furnishing the future states of $T$ (almost always equal to 1 in practical applications),
- $T^{-}$ denotes the $d_{T}$-dimensional set of vectors furnishing the past and present states of $T$,
- $S^{-}$ denotes the $d_{S}$-dimensional set of vectors furnishing the past and present of $S$, and
- $C^{-}$ denotes the $d_{C}$-dimensional set of vectors furnishing the past and present of $C$.
Keyword arguments
dS
,dT
,dC
,dTf
(f
for future) are the dimensions of the $S^{-}$, $T^{-}$, $C^{-}$ and $T^{+}$ marginals. The parametersdS
,dT
,dC
anddTf
must each be a positive integer number.τS
,τT
,τC
are the embedding lags for $S^{-}$, $T^{-}$, $C^{-}$. Each parameter are integers∈ 𝒩⁰⁻
, or a vector of integers∈ 𝒩⁰⁻
, so that $S^{-}$, $T^{-}$, $C^{-}$ always represents present/past values. If e.g.τT
is an integer, then for the $T^-$ marginal is constructed using lags $\tau_{T} = \{0, \tau, 2\tau, \ldots, (d_{T}- 1)\tau_T \}$. If is a vector, e.g.τΤ = [-1, -5, -7]
, then the dimensiondT
must match the lags, and precisely those lags are used: $\tau_{T} = \{-1, -5, -7 \}$.- The prediction lag(s)
ηTf
is a positive integer. Combined with the requirement that the other delay parameters are zero or negative, this ensures that we're always predicting from past/present to future. In typical applications,ηTf = 1
is used for transfer entropy.
Examples
Say we wanted to compute the Shannon transfer entropy $TE^S(S \to T) = I^S(T^+; S^- | T^-)$. Using some modern procedure for determining optimal embedding parameters using methods from DynamicalSystems.jl, we find that the optimal embedding of $T^{-}$ is three-dimensional and is given by the lags [0, -5, -8]
. Using the same procedure, we find that the optimal embedding of $S^{-}$ is two-dimensional with lags $[-1, -8]$. We want to predicting a univariate version of the target variable one time step into the future (ηTf = 1
). The total embedding is then the set of embedding vectors
$E_{TE} = \{ (T(i+1), S(i-1), S(i-8), T(i), T(i-5), T(i-8)) \}$. Translating this to code, we get:
using CausalityTools
julia> EmbeddingTE(dT=3, τT=[0, -5, -8], dS=2, τS=[-1, -4], ηTf=1)
# output
EmbeddingTE(dS=2, dT=3, dC=1, dTf=1, τS=[-1, -4], τT=[0, -5, -8], τC=-1, ηTf=1)
CausalityTools.Ensemble
— TypeEnsemble(est::CrossmapEstimator{<:CrossmapMeasure}, nreps::Int = 100)
A directive to compute an ensemble analysis, where measure
(e.g. ConvergentCrossMapping
) is computed using the given estimator est
(e.g. RandomVectors
)
Examples
- Example 1: Reproducing Figure 3A from Sugihara2012.
CausalityTools.EntropyDecomposition
— TypeEntropyDecomposition(definition::MultivariateInformationMeasure,
est::DifferentialInfoEstimator)
EntropyDecomposition(definition::MultivariateInformationMeasure,
est::DiscreteInfoEstimator,
discretization::CodifyVariables{<:OutcomeSpace},
pest::ProbabilitiesEstimator = RelativeAmount())
Estimate the multivariate information measure specified by definition
by rewriting its formula into some combination of entropy terms.
If calling the second method (discrete variant), then discretization is always done per variable/column and each column is encoded into integers using codify
.
Usage
- Use with
association
to compute aMultivariateInformationMeasure
from input data:association
(est::EntropyDecomposition, x...)
. - Use with some
IndependenceTest
to test for independence between variables.
Description
The entropy terms are estimated using est
, and then combined to form the final estimate of definition
. No bias correction is applied. If est
is a DifferentialInfoEstimator
, then discretization
and pest
are ignored. If est
is a DiscreteInfoEstimator
, then discretization
and a probabilities estimator pest
must also be provided (default to RelativeAmount
, which uses naive plug-in probabilities).
Compatible differential information estimators
If using the first signature, any compatible DifferentialInfoEstimator
can be used.
Compatible outcome spaces for discrete estimation
If using the second signature, the outcome spaces can be used for discretisation. Note that not all outcome spaces will work with all measures.
Estimator | Principle |
---|---|
UniqueElements | Count of unique elements |
ValueBinning | Binning (histogram) |
OrdinalPatterns | Ordinal patterns |
Dispersion | Dispersion patterns |
BubbleSortSwaps | Sorting complexity |
CosineSimilarityBinning | Cosine similarities histogram |
Bias
Estimating the definition
by decomposition into a combination of entropy terms, which are estimated independently, will in general be more biased than when using a dedicated estimator. One reason is that this decomposition may miss out on crucial information in the joint space. To remedy this, dedicated information measure estimators typically derive the marginal estimates by first considering the joint space, and then does some clever trick to eliminate the bias that is introduced through a naive decomposition. Unless specified below, no bias correction is applied for EntropyDecomposition
.
Handling of overlapping parameters
If there are overlapping parameters between the measure to be estimated, and the lower-level decomposed measures, then the top-level measure parameter takes precedence. For example, if we want to estimate CMIShannon(base = 2)
through a decomposition of entropies using the Kraskov(Shannon(base = ℯ))
Shannon entropy estimator, then base = 2
is used.
Not all measures have the property that they can be decomposed into more fundamental information theoretic quantities. For example, MITsallisMartin
can be decomposed into a combination of marginal entropies, while MIRenyiSarbu
cannot. An error will be thrown if decomposition is not possible.
Discrete entropy decomposition
The second signature is for discrete estimation using DiscreteInfoEstimator
s, for example PlugIn
. The given discretization
scheme (typically an OutcomeSpace
) controls how the joint/marginals are discretized, and the probabilities estimator pest
controls how probabilities are estimated from counts.
Like for DifferentialInfoEstimator
, using a dedicated estimator for the measure in question will be more reliable than using a decomposition estimate. Here's how different discretization
s are applied:
ValueBinning
. Bin visitation frequencies are counted in the joint spaceXY
, then marginal visitations are obtained from the joint bin visits. This behaviour is the same for bothFixedRectangularBinning
andRectangularBinning
(which adapts the grid to the data). When usingFixedRectangularBinning
, the range along the first dimension is used as a template for all other dimensions. This is a bit slower than naively binning each marginal, but lessens bias.OrdinalPatterns
. Each timeseries is separatelycodify
-ed according to its ordinal pattern (no bias correction).Dispersion
. Each timeseries is separatelycodify
-ed according to its dispersion pattern (no bias correction).
Examples
- Example 1:
MIShannon
estimation using decomposition into discreteShannon
entropy estimated usingCodifyVariables
withValueBinning
. - Example 2:
MIShannon
estimation using decomposition into discreteShannon
entropy estimated usingCodifyVariables
withBubbleSortSwaps
. - Example 3:
MIShannon
estimation using decomposition into differentalShannon
entropy estimated using theKraskov
estimator.
See also: MutualInformationEstimator
, MultivariateInformationMeasure
.
CausalityTools.ExpandingSegment
— TypeExpandingSegment <: CrossmapEstimator
ExpandingSegment(definition::CrossmapMeasure; libsizes, rng = Random.default_rng())
Cross map once over N = length(libsizes)
different "point libraries", where point indices are selected as time-contiguous segments/windows.
This is the method from Sugihara2012. See CrossmapEstimator
for an in-depth explanation of what "library" means in this context.
Description
Point index segments are selected as first available data point index, up to the L
th data point index. This results in one library of contiguous time indices per L ∈ libsizes
.
If used in an ensemble setting, the estimator is applied to time indices Lmin:step:Lmax
of the joint embedding.
Returns
The return type when used with association
depends on the type of libsizes
.
- If
libsizes
is anInt
(a single library), then a single cross-map estimate is returned. - If
libsizes
is anAbstractVector{Int}
(multiple libraries), then a vector of cross-map estimates is returned –- one per library.
CausalityTools.FPVP
— TypeFPVP <: ConditionalMutualInformationEstimator
FPVP(definition = CMIShannon(); k = 1, w = 0)
The Frenzel-Pompe-Vejmelka-Paluš (or FPVP
for short) ConditionalMutualInformationEstimator
is used to estimate the conditional mutual information using a k
-th nearest neighbor approach that is analogous to that of the KraskovStögbauerGrassberger1
mutual information estimator from Frenzel2007 and Vejmelka2008.
k
is the number of nearest neighbors. w
is the Theiler window, which controls the number of temporal neighbors that are excluded during neighbor searches.
Compatible definitions
Usage
- Use with
association
to computeConditionalMutualInformation
measure from input data. - Use with some
IndependenceTest
to test for independence between variables.
Examples
- Example 1: Estimating
CMIShannon
CausalityTools.GaoKannanOhViswanath
— TypeGaoKannanOhViswanath <: MutualInformationEstimator
GaoKannanOhViswanath(; k = 1, w = 0)
The GaoKannanOhViswanath
(Shannon) estimator is designed for estimating Shannon mutual information between variables that may be either discrete, continuous or a mixture of both GaoKannanOhViswanath2017.
Compatible definitions
Usage
- Use with
association
to compute Shannon mutual information from input data. - Use with some
IndependenceTest
to test for independence between variables.
Description
The estimator starts by expressing mutual information in terms of the Radon-Nikodym derivative, and then estimates these derivatives using k
-nearest neighbor distances from empirical samples.
The estimator avoids the common issue of having to add noise to data before analysis due to tied points, which may bias other estimators. Citing their paper, the estimator "strongly outperforms natural baselines of discretizing the mixed random variables (by quantization) or making it continuous by adding a small Gaussian noise."
In GaoKannanOhViswanath2017, they claim (roughly speaking) that the estimator reduces to the KraskovStögbauerGrassberger1
estimator for continuous-valued data. However, KraskovStögbauerGrassberger1
uses the digamma function, while GaoKannanOhViswanath
uses the logarithm instead, so the estimators are not exactly equivalent for continuous data.
Moreover, in their algorithm 1, it is clearly not the case that the method falls back on the KraskovStögbauerGrassberger1
approach. The KraskovStögbauerGrassberger1
estimator uses k
-th neighbor distances in the joint space, while the GaoKannanOhViswanath
algorithm selects the maximum k
-th nearest distances among the two marginal spaces, which are in general not the same as the k
-th neighbor distance in the joint space (unless both marginals are univariate). Therefore, our implementation here differs slightly from algorithm 1 in GaoKannanOhViswanath
. We have modified it in a way that mimics KraskovStögbauerGrassberger1
for continous data. Note that because of using the log
function instead of digamma
, there will be slight differences between the methods. See the source code for more details.
Even though the GaoKannanOhViswanath
estimator is designed to handle discrete data, our implementation demands that all input data are StateSpaceSet
s whose data points are floats. If you have discrete data, such as strings or symbols, encode them using integers and convert those integers to floats before passing them to association
.
Examples
using CausalityTools
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000); y = rand(rng, 10000)
association(GaoKannanOhViswanath(; k = 10), x, y) # should be near 0 (and can be negative)
CausalityTools.GaoOhViswanath
— TypeGaoOhViswanath <: MutualInformationEstimator
The GaoOhViswanath
is a mutual information estimator based on nearest neighbors, and is also called the bias-improved-KSG estimator, or BI-KSG, by Gao2018.
Compatible definitions
Usage
- Use with
association
to compute Shannon mutual information from input data. - Use with some
IndependenceTest
to test for independence between variables.
Description
The estimator is given by
\[\begin{align*} \hat{H}_{GAO}(X, Y) &= \hat{H}_{KSG}(X) + \hat{H}_{KSG}(Y) - \hat{H}_{KZL}(X, Y) \\ &= \psi{(k)} + \log{(N)} + \log{ \left( \dfrac{c_{d_{x}, 2} c_{d_{y}, 2}}{c_{d_{x} + d_{y}, 2}} \right) } - \\ & \dfrac{1}{N} \sum_{i=1}^N \left( \log{(n_{x, i, 2})} + \log{(n_{y, i, 2})} \right) \end{align*},\]
where $c_{d, 2} = \dfrac{\pi^{\frac{d}{2}}}{\Gamma{(\dfrac{d}{2} + 1)}}$ is the volume of a $d$-dimensional unit $\mathcal{l}_2$-ball.
Example
using CausalityTools
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000); y = rand(rng, 10000)
association(GaoOhViswanath(; k = 10), x, y) # should be near 0 (and can be negative)
CausalityTools.GaussianCMI
— TypeGaussianCMI <: MutualInformationEstimator
GaussianCMI(definition = CMIShannon(); normalize::Bool = false)
GaussianCMI
is a parametric ConditionalMutualInformationEstimator
Vejmelka2008.
Compatible definitions
Usage
- Use with
association
to computeCMIShannon
from input data. - Use with some
IndependenceTest
to test for independence between variables.
Description
GaussianCMI
estimates Shannon CMI through a sum of two mutual information terms that each are estimated using GaussianMI
(the normalize
keyword is the same as for GaussianMI
):
\[\hat{I}_{Gaussian}(X; Y | Z) = \hat{I}_{Gaussian}(X; Y, Z) - \hat{I}_{Gaussian}(X; Z)\]
Examples
- Example 1. Estimating
CMIShannon
.
CausalityTools.GaussianMI
— TypeGaussianMI <: MutualInformationEstimator
GaussianMI(; normalize::Bool = false)
GaussianMI
is a parametric estimator for Shannon mutual information.
Compatible definitions
Usage
- Use with
association
to compute Shannon mutual information from input data. - Use with some
IndependenceTest
to test for independence between variables.
Description
Given $d_x$-dimensional and $d_y$-dimensional input data X
and Y
, GaussianMI
first constructs the $d_x + d_y$-dimensional joint StateSpaceSet
XY
. If normalize == true
, then we follow the approach in Vejmelka & Palus (2008)Vejmelka2008 and transform each column in XY
to have zero mean and unit standard deviation. If normalize == false
, then the algorithm proceeds without normalization.
Next, the C_{XY}
, the correlation matrix for the (normalized) joint data XY
is computed. The mutual information estimate GaussianMI
assumes the input variables are distributed according to normal distributions with zero means and unit standard deviations. Therefore, given $d_x$-dimensional and $d_y$-dimensional input data X
and Y
, GaussianMI
first constructs the joint StateSpaceSet
XY
, then transforms each column in XY
to have zero mean and unit standard deviation, and finally computes the \Sigma
, the correlation matrix for XY
.
The mutual information estimated (for normalize == false
) is then estimated as
\[\hat{I}^S_{Gaussian}(X; Y) = \dfrac{1}{2} \dfrac{ \det(\Sigma_X) \det(\Sigma_Y)) }{\det(\Sigma))},\]
where we $\Sigma_X$ and $\Sigma_Y$ appear in $\Sigma$ as
\[\Sigma = \begin{bmatrix} \Sigma_{X} & \Sigma^{'}\\ \Sigma^{'} & \Sigma_{Y} \end{bmatrix}.\]
If normalize == true
, then the mutual information is estimated as
\[\hat{I}^S_{Gaussian}(X; Y) = -\dfrac{1}{2} \sum_{i = 1}^{d_x + d_y} \sigma_i,\]
where $\sigma_i$ are the eigenvalues for $\Sigma$.
Example
using CausalityTools
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000); y = rand(rng, 10000)
association(GaussianMI(), x, y) # should be near 0 (and can be negative)
CausalityTools.GraphAlgorithm
— TypeCausalityTools.HMeasure
— TypeHMeasure <: AssociationMeasure
HMeasure(; K::Int = 2, dx = 2, dy = 2, τx = - 1, τy = -1, w = 0)
The HMeasure
Arnhold1999 is a pairwise association measure. It quantifies the probability with which close state of a target timeseries/embedding are mapped to close states of a source timeseries/embedding.
Note that τx
and τy
are negative by convention. See docstring for SMeasure
for an explanation.
Usage
- Use with
association
to compute the raw h-measure statistic. - Use with
independence
to perform a formal hypothesis test for directional dependence.
Description
The HMeasure
Arnhold1999 is similar to the SMeasure
, but the numerator of the formula is replaced by $R_i(x)$, the mean squared Euclidean distance to all other points, and there is a $\log$-term inside the sum:
\[H^{(k)}(x|y) = \dfrac{1}{N} \sum_{i=1}^{N} \log \left( \dfrac{R_i(x)}{R_i^{(k)}(x|y)} \right).\]
Parameters are the same and $R_i^{(k)}(x|y)$ is computed as for SMeasure
.
See also: ClosenessMeasure
.
CausalityTools.HellingerDistance
— TypeHellingerDistance <: DivergenceOrDistance
The Hellinger distance.
Usage
- Use with
association
to compute the compute the Hellinger distance between two pre-computed probability distributions, or from raw data using one of the estimators listed below.
Compatible estimators
Description
The Hellinger distance between two probability distributions $P_X = (p_x(\omega_1), \ldots, p_x(\omega_n))$ and $P_Y = (p_y(\omega_1), \ldots, p_y(\omega_m))$, both defined over the same OutcomeSpace
$\Omega = \{\omega_1, \ldots, \omega_n \}$, is defined as
\[D_{H}(P_Y(\Omega) || P_Y(\Omega)) = \dfrac{1}{\sqrt{2}} \sum_{\omega \in \Omega} (\sqrt{p_x(\omega)} - \sqrt{p_y(\omega)})^2\]
Estimation
- Example 1: From precomputed probabilities
- Example 2:
JointProbabilities
withOrdinalPatterns
outcome space
CausalityTools.Hilbert
— TypeHilbert(est;
source::InstantaneousSignalProperty = Phase(),
target::InstantaneousSignalProperty = Phase(),
cond::InstantaneousSignalProperty = Phase())
) <: TransferDifferentialEntropyEstimator
Compute transfer entropy on instantaneous phases/amplitudes of relevant signals, which are obtained by first applying the Hilbert transform to each signal, then extracting the phases/amplitudes of the resulting complex numbers Palus2014. Original time series are thus transformed to instantaneous phase/amplitude time series. Transfer entropy is then estimated using the provided est
on those phases/amplitudes (use e.g. ValueBinning
, or OrdinalPatterns
).
Details on estimation of the transfer entropy (conditional mutual information) following the phase/amplitude extraction step is not given in Palus (2014). Here, after instantaneous phases/amplitudes have been obtained, these are treated as regular time series, from which transfer entropy is then computed as usual.
CausalityTools.IndependenceTest
— TypeIndependenceTest <: IndependenceTest
The supertype for all independence tests.
CausalityTools.JDDTestResult
— TypeJDDTestResult(Δjdd, hypothetical_μ, pvalue)
Holds the results of JointDistanceDistributionTest
. Δjdd
is the Δ
-distribution, hypothetical_μ
is the hypothetical mean of the Δ
-distribution under the null, and pvalue
is the p-value for the one-sided t-test.
CausalityTools.JointDistanceDistribution
— TypeJointDistanceDistribution <: AssociationMeasure end
JointDistanceDistribution(; metric = Euclidean(), B = 10, D = 2, τ = -1, μ = 0.0)
The joint distance distribution (JDD) measure Amigo2018.
Usage
- Use with
association
to compute the joint distance distribution measureΔ
from Amigo2018. - Use with
independence
to perform a formal hypothesis test for directional dependence.
Keyword arguments
distance_metric::Metric
: An instance of a valid distance metric fromDistances.jl
. Defaults toEuclidean()
.B::Int
: The number of equidistant subintervals to divide the interval[0, 1]
into when comparing the normalised distances.D::Int
: Embedding dimension.τ::Int
: Embedding delay. By convention,τ
is negative.μ
: The hypothetical mean value of the joint distance distribution if there is no coupling betweenx
andy
(default isμ = 0.0
).
Description
From input time series $x(t)$ and $y(t)$, we first construct the delay embeddings (note the positive sign in the embedding lags; therefore the input parameter τ
is by convention negative).
\[\begin{align*} \{\bf{x}_i \} &= \{(x_i, x_{i+\tau}, \ldots, x_{i+(d_x - 1)\tau}) \} \\ \{\bf{y}_i \} &= \{(y_i, y_{i+\tau}, \ldots, y_{i+(d_y - 1)\tau}) \} \\ \end{align*}\]
The algorithm then proceeds to analyze the distribution of distances between points of these embeddings, as described in Amigo2018.
Examples
CausalityTools.JointDistanceDistributionTest
— TypeJointDistanceDistributionTest <: IndependenceTest
JointDistanceDistributionTest(measure::JointDistanceDistribution; rng = Random.default_rng())
An independence test for two variables based on the JointDistanceDistribution
Amigo2018.
When used with independence
, a JDDTestResult
is returned.
Description
The joint distance distribution (labelled Δ
in their paper) is used by Amigó & Hirata (2018) to detect directional couplings of the form $X \to Y$ or $Y \to X$. JointDistanceDistributionTest
formulates their method as an independence test.
Formally, we test the hypothesis $H_0$ (the variables are independent) against $H_1$ (there is directional coupling between the variables). To do so, we use a right-sided/upper-tailed t-test to check mean of Δ
is skewed towards positive value, i.e.
- $H_0 := \mu(\Delta) = 0$
- $H_1 := \mu(\Delta) > 0$.
When used with independence
, a JDDTestResult
is returned, which contains the joint distance distribution and a p-value. If you only need Δ
, use association
with a JointDistanceDistribution
instance directly.
Examples
- Example 1. Detecting (in)dependence in bidirectionally coupled logistic maps.
CausalityTools.JointEntropy
— TypeJointEntropy <: BivariateInformationMeasure
The supertype for all joint entropy measures.
Concrete implementations
CausalityTools.JointEntropyRenyi
— TypeJointEntropyRenyi <: JointEntropy
JointEntropyRenyi(; base = 2, q = 1.5)
The Rényi joint entropy measure Golshani2009.
Usage
- Use with
association
to compute the Golshani-Rényi joint entropy between two variables.
Compatible estimators
Definition
Given two two discrete random variables $X$ and $Y$ with ranges $\mathcal{X}$ and $\mathcal{X}$, Golshani2009 defines the Rényi joint entropy as
\[H_q^R(X, Y) = \dfrac{1}{1-\alpha} \log \sum_{i = 1}^N p_i^q,\]
where $q > 0$ and $q != 1$.
Estimation
- Example 1:
JointProbabilities
withValueBinning
outcome space
CausalityTools.JointEntropyShannon
— TypeJointEntropyShannon <: JointEntropy
JointEntropyShannon(; base = 2)
The Shannon joint entropy measure CoverThomas1999.
Usage
- Use with
association
to compute the Shannon joint entropy between two variables.
Compatible estimators
Definition
Given two two discrete random variables $X$ and $Y$ with ranges $\mathcal{X}$ and $\mathcal{X}$, CoverThomas1999 defines the Shannon joint entropy as
\[H^S(X, Y) = -\sum_{x\in \mathcal{X}, y \in \mathcal{Y}} p(x, y) \log p(x, y),\]
where we define $log(p(x, y)) := 0$ if $p(x, y) = 0$.
Estimation
- Example 1:
JointProbabilities
withDispersion
outcome space
CausalityTools.JointEntropyTsallis
— TypeJointEntropyTsallis <: JointEntropy
JointEntropyTsallis(; base = 2, q = 1.5)
The Tsallis joint entropy definition from Furuichi2006.
Usage
- Use with
association
to compute the Furuichi-Tsallis joint entropy between two variables.
Compatible estimators
Definition
Given two two discrete random variables $X$ and $Y$ with ranges $\mathcal{X}$ and $\mathcal{X}$, Furuichi2006 defines the Tsallis joint entropy as
\[H_q^T(X, Y) = -\sum_{x\in \mathcal{X}, y \in \mathcal{Y}} p(x, y)^q \log_q p(x, y),\]
where $log_q(x, q) = \dfrac{x^{1-q} - 1}{1-q}$ is the q-logarithm, and we define $log_q(x, q) := 0$ if $q = 0$.
Estimation
- Example 1:
JointProbabilities
withOrdinalPatterns
outcome space
CausalityTools.JointProbabilities
— TypeJointProbabilities <: InformationMeasureEstimator
JointProbabilities(
definition::MultivariateInformationMeasure,
discretization::Discretization
)
JointProbabilities
is a generic estimator for multivariate discrete information measures.
Usage
- Use with
association
to compute an information measure from input data.
Description
It first encodes the input data according to the given discretization
, then constructs probs
, a multidimensional Probabilities
instance. Finally, probs
are forwarded to a PlugIn
estimator, which computes the measure according to definition
.
Compatible encoding schemes
CodifyVariables
(encode each variable/column of the input data independently by applying an encoding in a sliding window over each input variable).CodifyPoints
(encode each point/column of the input data)
Works for any OutcomeSpace
that implements codify
.
Using JointProbabilities
to compute an information measure, e.g. conditional mutual estimation, is typically slower than other dedicated estimation procedures like EntropyDecomposition
. The reason is that measures such as CMIShannon
can be formulated as a sum of four entropies, which can be estimated individually and summed afterwards. This decomposition is fast because because we avoid explicitly estimating the entire joint pmf, which demands many extra calculation steps, However, the decomposition is biased, because it fails to fully take into consideration the joint relationships between the variables. Pick your estimator according to your needs.
See also: Counts
, Probabilities
, ProbabilitiesEstimator
, OutcomeSpace
, DiscreteInfoEstimator
.
CausalityTools.KLDivergence
— TypeKLDivergence <: DivergenceOrDistance
The Kullback-Leibler (KL) divergence.
Usage
- Use with
association
to compute the compute the KL-divergence between two pre-computed probability distributions, or from raw data using one of the estimators listed below.
Compatible estimators
Estimators
Description
The KL-divergence between two probability distributions $P_X = (p_x(\omega_1), \ldots, p_x(\omega_n))$ and $P_Y = (p_y(\omega_1), \ldots, p_y(\omega_m))$, both defined over the same OutcomeSpace
$\Omega = \{\omega_1, \ldots, \omega_n \}$, is defined as
\[D_{KL}(P_Y(\Omega) || P_Y(\Omega)) = \sum_{\omega \in \Omega} p_x(\omega) \log\dfrac{p_x(\omega)}{p_y(\omega)}\]
Implements
association
. Used to compute the KL-divergence between two pre-computed probability distributions. If used withRelativeAmount
, the KL divergence may be undefined to due some outcomes having zero counts. Use some otherProbabilitiesEstimator
likeBayesianRegularization
to ensure all estimated probabilities are nonzero.
Distances.jl also defines KLDivergence
. Quality it if you're loading both packages, i.e. do association(CausalityTools.KLDivergence(), x, y)
.
Estimation
- Example 1: From precomputed probabilities
- Example 2:
JointProbabilities
withOrdinalPatterns
outcome space
CausalityTools.KraskovStögbauerGrassberger1
— TypeKSG1 <: MutualInformationEstimator
KraskovStögbauerGrassberger1 <: MutualInformationEstimator
KraskovStögbauerGrassberger1(; k::Int = 1, w = 0, metric_marginals = Chebyshev())
The KraskovStögbauerGrassberger1
mutual information estimator (you can use KSG1
for short) is the $I^{(1)}$ k
-th nearest neighbor estimator from Kraskov2004.
Compatible definitions
Usage
- Use with
association
to compute Shannon mutual information from input data. - Use with some
IndependenceTest
to test for independence between variables.
Keyword arguments
k::Int
: The number of nearest neighbors to consider. Only information about thek
-th nearest neighbor is actually used.metric_marginals
: The distance metric for the marginals for the marginals can be any metric fromDistances.jl
. It defaults tometric_marginals = Chebyshev()
, which is the same as in Kraskov et al. (2004).w::Int
: The Theiler window, which determines if temporal neighbors are excluded during neighbor searches in the joint space. Defaults to0
, meaning that only the point itself is excluded.
Example
using CausalityTools
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000); y = rand(rng, 10000)
association(KSG1(; k = 10), x, y) # should be near 0 (and can be negative)
CausalityTools.KraskovStögbauerGrassberger2
— TypeKSG2 <: MutualInformationEstimator
KraskovStögbauerGrassberger2 <: MutualInformationEstimator
KraskovStögbauerGrassberger2(; k::Int = 1, w = 0, metric_marginals = Chebyshev())
The KraskovStögbauerGrassberger2
mutual information estimator (you can use KSG2
for short) is the $I^{(2)}$ k
-th nearest neighbor estimator from Kraskov2004.
Compatible definitions
Usage
- Use with
association
to compute Shannon mutual information from input data. - Use with some
IndependenceTest
to test for independence between variables.
Keyword arguments
k::Int
: The number of nearest neighbors to consider. Only information about thek
-th nearest neighbor is actually used.metric_marginals
: The distance metric for the marginals for the marginals can be any metric fromDistances.jl
. It defaults tometric_marginals = Chebyshev()
, which is the same as in Kraskov et al. (2004).w::Int
: The Theiler window, which determines if temporal neighbors are excluded during neighbor searches in the joint space. Defaults to0
, meaning that only the point itself is excluded.
Description
Let the joint StateSpaceSet $X := \{\bf{X}_1, \bf{X_2}, \ldots, \bf{X}_m \}$ be defined by the concatenation of the marginal StateSpaceSets $\{ \bf{X}_k \}_{k=1}^m$, where each $\bf{X}_k$ is potentially multivariate. Let $\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N$ be the points in the joint space $X$.
The KraskovStögbauerGrassberger2
estimator first locates, for each $\bf{x}_i \in X$, the point $\bf{n}_i \in X$, the k
-th nearest neighbor to $\bf{x}_i$, according to the maximum norm (Chebyshev
metric). Let $\epsilon_i$ be the distance $d(\bf{x}_i, \bf{n}_i)$.
Consider $x_i^m \in \bf{X}_m$, the $i$-th point in the marginal space $\bf{X}_m$. For each $\bf{x}_i^m$, we determine $\theta_i^m$ := the number of points $\bf{x}_k^m \in \bf{X}_m$ that are a distance less than $\epsilon_i$ away from $\bf{x}_i^m$. That is, we use the distance from a query point $\bf{x}_i \in X$ (in the joint space) to count neighbors of $x_i^m \in \bf{X}_m$ (in the marginal space).
Shannon mutual information between the variables $\bf{X}_1, \bf{X_2}, \ldots, \bf{X}_m$ is then estimated as
\[\hat{I}_{KSG2}(\bf{X}) = \psi{(k)} - \dfrac{m - 1}{k} + (m - 1)\psi{(N)} - \dfrac{1}{N} \sum_{i = 1}^N \sum_{j = 1}^m \psi{(\theta_i^j + 1)}\]
Example
using CausalityTools
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000); y = rand(rng, 10000)
association(KSG2(; k = 10), x, y) # should be near 0 (and can be negative)
CausalityTools.LMeasure
— TypeLMeasure <: ClosenessMeasure
LMeasure(; K::Int = 2, dx = 2, dy = 2, τx = - 1, τy = -1, w = 0)
The LMeasure
Chicharro2009 is a pairwise association measure. It quantifies the probability with which close state of a target timeseries/embedding are mapped to close states of a source timeseries/embedding.
Note that τx
and τy
are negative by convention. See docstring for SMeasure
for an explanation.
Usage
- Use with
association
to compute the raw L-measure statistic. - Use with
independence
to perform a formal hypothesis test for directional dependence.
Description
LMeasure
is similar to MMeasure
, but uses distance ranks instead of the raw distances.
Let $\bf{x_i}$ be an embedding vector, and let $g_{i,j}$ denote the rank that the distance between $\bf{x_i}$ and some other vector $\bf{x_j}$ in a sorted ascending list of distances between $\bf{x_i}$ and $\bf{x_{i \neq j}}$ In other words, $g_{i,j}$ this is just the $N-1$ nearest neighbor distances sorted )
LMeasure
is then defined as
\[L^{(k)}(x|y) = \dfrac{1}{N} \sum_{i=1}^{N} \log \left( \dfrac{G_i(x) - G_i^{(k)}(x|y)}{G_i(x) - G_i^k(x)} \right),\]
where $G_i(x) = \frac{N}{2}$ and $G_i^K(x) = \frac{k+1}{2}$ are the mean and minimal rank, respectively.
The $y$-conditioned mean rank is defined as
\[G_i^{(k)}(x|y) = \dfrac{1}{K}\sum_{j=1}^{K} g_{i,w_{i, j}},\]
where $w_{i,j}$ is the index of the $j$-th nearest neighbor of $\bf{y_i}$.
See also: ClosenessMeasure
.
CausalityTools.Lindner
— TypeLindner <: TransferEntropyEstimator
Lindner(definition = Shannon(); k = 1, w = 0, base = 2)
The Lindner
transfer entropy estimator Lindner2011, which is also used in the Trentool MATLAB toolbox, and is based on nearest neighbor searches.
Compatible definitions
Usage
- Use with
association
to computeTEShannon
from input data. - Use with some
IndependenceTest
to test for independence between variables.
Keyword parameters
w
is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to 0
, meaning that only the point itself is excluded when searching for neighbours).
The estimator can be used both for pairwise and conditional transfer entropy estimation.
Description
For a given points in the joint embedding space jᵢ
, this estimator first computes the distance dᵢ
from jᵢ
to its k
-th nearest neighbor. Then, for each point mₖ[i]
in the k
-th marginal space, it counts the number of points within radius dᵢ
.
The Shannon transfer entropy is then computed as
\[TE_S(X \to Y) = \psi(k) + \dfrac{1}{N} \sum_{i}^n \left[ \sum_{k=1}^3 \left( \psi(m_k[i] + 1) \right) \right],\]
where the index k
references the three marginal subspaces T
, TTf
and ST
for which neighbor searches are performed. Here this estimator has been modified to allow for conditioning too (a simple modification to Lindner2011's equation 5 and 6).
Example
using CausalityTools
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000)
y = rand(rng, 10000) .+ x
z = rand(rng, 10000) .+ y
est = Lindner(TEShannon(), k = 10)
association(est, x, z, y) # should be near 0 (and can be negative)
CausalityTools.LocalPermutationClosenessSearch
— TypeThe supertype of all types indicating a way of determining "closeness" for the local permutation algorithm.
CausalityTools.LocalPermutationTest
— TypeLocalPermutationTest <: IndependenceTest
LocalPermutationTest(measure, [est];
kperm::Int = 5,
nshuffles::Int = 100,
rng = Random.default_rng(),
replace = true,
w::Int = 0,
show_progress = false)
LocalPermutationTest
is a generic conditional independence test Runge2018LocalPerm for assessing whether two variables X
and Y
are conditionally independendent given a third variable Z
(all of which may be multivariate).
When used with independence
, a LocalPermutationTestResult
is returned.
Description
This is a generic one-sided hypothesis test that checks whether X
and Y
are independent (given Z
, if provided) based on resampling from a null distribution assumed to represent independence between the variables. The null distribution is generated by repeatedly shuffling the input data in some way that is intended to break any dependence between x
and y
, but preserve dependencies between x
and z
.
The algorithm is as follows:
- Compute the original conditional independence statistic
I(X; Y | Z)
. - Allocate a scalar valued vector
Î
with space fornshuffles
elements. - For
k ∈ [1, 2, …, nshuffles]
, repeat- For each
zᵢ ∈ Y
, letnᵢ
be time indices of thekperm
nearest neighbors ofzᵢ
, excluding thew
nearest neighbors ofzᵢ
from the neighbor query (i.ew
is the Theiler window). - Let
xᵢ⋆ = X[j]
, wherej
is randomly sampled fromnᵢ
with replacement. This way,xᵢ
is replaced withxⱼ
only ifzᵢ ≈ zⱼ
(zᵢ
andzⱼ
are close). Repeat fori = 1, 2, …, n
and obtain the shuffledX̂ = [x̂₁, x̂₂, …, x̂ₙ]
. - Compute the conditional independence statistic
Iₖ(X̂; Y | Z)
. - Let
Î[k] = Iₖ(X̂; Y | Z)
.
- For each
- Compute the p-value as
count(Î[k] .<= I) / nshuffles)
.
In additional to the conditional variant from Runge (2018), we also provide a pairwise version, where the shuffling procedure is identical, except neighbors in Y
are used instead of Z
and we I(X; Y)
and Iₖ(X̂; Y)
instead of I(X; Y | Z)
and Iₖ(X̂; Y | Z)
.
Compatible measures
Measure | Pairwise | Conditional | Requires est | Note |
---|---|---|---|---|
PartialCorrelation | ✖ | ✓ | No | |
DistanceCorrelation | ✖ | ✓ | No | |
CMIShannon | ✖ | ✓ | Yes | |
TEShannon | ✓ | ✓ | Yes | Pairwise tests not possible with TransferEntropyEstimator s, only lower-level estimators, e.g. FPVP , GaussianMI or Kraskov |
PartialMutualInformation | ✖ | ✓ | Yes |
The LocalPermutationTest
is only defined for conditional independence testing. Exceptions are for measures like TEShannon
, which use conditional measures under the hood even for their pairwise variants, and are therefore compatible with LocalPermutationTest
.
The nearest-neighbor approach in Runge (2018) can be reproduced by using the CMIShannon
measure with the FPVP
estimator.
Examples
- Example 1: Conditional independence test using
CMIShannon
- Example 2): Conditional independence test using
TEShannon
CausalityTools.LocalPermutationTestResult
— TypeLocalPermutationTestResult(m, m_surr, pvalue)
Holds the result of a LocalPermutationTest
. m
is the measure computed on the original data. m_surr
is a vector of the measure computed on permuted data, where m_surr[i]
is the measure compute on the i
-th permutation. pvalue
is the one-sided p
-value for the test.
CausalityTools.MCR
— TypeMCR <: AssociationMeasure
MCR(; r, metric = Euclidean())
An association measure based on mean conditional probabilities of recurrence (MCR) introduced by Romano2007.
Usage
- Use with
association
to compute the raw MCR for pairwise or conditional association. - Use with
IndependenceTest
to perform a formal hypothesis test for pairwise or conditional association.
Description
r
is mandatory keyword which specifies the recurrence threshold when constructing recurrence matrices. It can be instance of any subtype of AbstractRecurrenceType
from RecurrenceAnalysis.jl. To use any r
that is not a real number, you have to do using RecurrenceAnalysis
first. The metric
is any valid metric from Distances.jl.
For input variables X
and Y
, the conditional probability of recurrence is defined as
\[M(X | Y) = \dfrac{1}{N} \sum_{i=1}^N p(\bf{y_i} | \bf{x_i}) = \dfrac{1}{N} \sum_{i=1}^N \dfrac{\sum_{i=1}^N J_{R_{i, j}}^{X, Y}}{\sum_{i=1}^N R_{i, j}^X},\]
where $R_{i, j}^X$ is the recurrence matrix and $J_{R_{i, j}}^{X, Y}$ is the joint recurrence matrix, constructed using the given metric
. The measure $M(Y | X)$ is defined analogously.
Romano2007's interpretation of this quantity is that if X
drives Y
, then M(X|Y) > M(Y|X)
, if Y
drives X
, then M(Y|X) > M(X|Y)
, and if coupling is symmetric, then M(Y|X) = M(X|Y)
.
Input data
X
and Y
can be either both univariate timeseries, or both multivariate StateSpaceSet
s.
Estimation
- Example 1. Pairwise versus conditional MCR.
CausalityTools.MIDecomposition
— TypeMIDecomposition(definition::MultivariateInformationMeasure,
est::MutualInformationEstimator)
Estimate the MultivariateInformationMeasure
specified by definition
by by decomposing, the measure, if possible, into a combination of mutual information terms. These terms are individually estimated using the given MutualInformationEstimator
est
, and finally combined to form the final value of the measure.
Usage
- Use with
association
to compute aMultivariateInformationMeasure
from input data:association
(est::MIDecomposition, x...)
. - Use with some
IndependenceTest
to test for independence between variables.
Examples
- Example 1: Estimating
CMIShannon
using a decomposition intoMIShannon
terms using theKraskovStögbauerGrassberger1
mutual information estimator.
See also: MultivariateInformationMeasureEstimator
.
CausalityTools.MIRenyiJizba
— TypeMIRenyiJizba <: <: BivariateInformationMeasure
MIRenyiJizba(; q = 1.5, base = 2)
The Rényi mutual information $I_q^{R_{J}}(X; Y)$ defined in Jizba2012.
Usage
- Use with
association
to compute the raw Rényi-Jizba mutual information from input data using of of the estimators listed below. - Use with
independence
to perform a formal hypothesis test for pairwise dependence using the Rényi-Jizba mutual information.
Compatible estimators
Definition
\[I_q^{R_{J}}(X; Y) = H_q^{R}(X) + H_q^{R}(Y) - H_q^{R}(X, Y),\]
where $H_q^{R}(\cdot)$ is the Rényi
entropy.
Estimation
- Example 1:
JointProbabilities
withUniqueElements
outcome space. - Example 2:
EntropyDecomposition
withLeonenkoProzantoSavani
. - Example 3:
EntropyDecomposition
withValueBinning
.
CausalityTools.MIRenyiSarbu
— TypeMIRenyiSarbu <: BivariateInformationMeasure
MIRenyiSarbu(; base = 2, q = 1.5)
The discrete Rényi mutual information from Sarbu2014.
Usage
- Use with
association
to compute the raw Rényi-Sarbu mutual information from input data using of of the estimators listed below. - Use with
independence
to perform a formal hypothesis test for pairwise dependence using the Rényi-Sarbu mutual information.
Compatible estimators
Description
Sarbu (2014) defines discrete Rényi mutual information as the Rényi $\alpha$-divergence between the conditional joint probability mass function $p(x, y)$ and the product of the conditional marginals, $p(x) \cdot p(y)$:
\[I(X, Y)^R_q = \dfrac{1}{q-1} \log \left( \sum_{x \in X, y \in Y} \dfrac{p(x, y)^q}{\left( p(x)\cdot p(y) \right)^{q-1}} \right)\]
Estimation
- Example 1:
JointProbabilities
withUniqueElements
for categorical data. - Example 2:
JointProbabilities
withCosineSimilarityBinning
for numerical data.
CausalityTools.MIShannon
— TypeMIShannon <: BivariateInformationMeasure
MIShannon(; base = 2)
The Shannon mutual information $I_S(X; Y)$.
Usage
- Use with
association
to compute the raw Shannon mutual information from input data using of of the estimators listed below. - Use with
independence
to perform a formal hypothesis test for pairwise dependence using the Shannon mutual information.
Compatible estimators
JointProbabilities
(generic)EntropyDecomposition
(generic)KraskovStögbauerGrassberger1
KraskovStögbauerGrassberger2
GaoOhViswanath
GaoKannanOhViswanath
GaussianMI
Discrete definition
There are many equivalent formulations of discrete Shannon mutual information, meaning that it can be estimated in several ways, either using JointProbabilities
(double-sum formulation), EntropyDecomposition
(three-entropies decomposition), or some dedicated estimator.
Double sum formulation
Assume we observe samples $\bar{\bf{X}}_{1:N_y} = \{\bar{\bf{X}}_1, \ldots, \bar{\bf{X}}_n \}$ and $\bar{\bf{Y}}_{1:N_x} = \{\bar{\bf{Y}}_1, \ldots, \bar{\bf{Y}}_n \}$ from two discrete random variables $X$ and $Y$ with finite supports $\mathcal{X} = \{ x_1, x_2, \ldots, x_{M_x} \}$ and $\mathcal{Y} = y_1, y_2, \ldots, x_{M_y}$. The double-sum estimate is obtained by replacing the double sum
\[\hat{I}_{DS}(X; Y) = \sum_{x_i \in \mathcal{X}, y_i \in \mathcal{Y}} p(x_i, y_j) \log \left( \dfrac{p(x_i, y_i)}{p(x_i)p(y_j)} \right)\]
where $\hat{p}(x_i) = \frac{n(x_i)}{N_x}$, $\hat{p}(y_i) = \frac{n(y_j)}{N_y}$, and $\hat{p}(x_i, x_j) = \frac{n(x_i)}{N}$, and $N = N_x N_y$. This definition is used by association
when called with a JointProbabilities
estimator.
Three-entropies formulation
An equivalent formulation of discrete Shannon mutual information is
\[I^S(X; Y) = H^S(X) + H_q^S(Y) - H^S(X, Y),\]
where $H^S(\cdot)$ and $H^S(\cdot, \cdot)$ are the marginal and joint discrete Shannon entropies. This definition is used by association
when called with a EntropyDecomposition
estimator and a discretization.
Differential mutual information
One possible formulation of differential Shannon mutual information is
\[I^S(X; Y) = h^S(X) + h_q^S(Y) - h^S(X, Y),\]
where $h^S(\cdot)$ and $h^S(\cdot, \cdot)$ are the marginal and joint differential Shannon entropies. This definition is used by association
when called with EntropyDecomposition
estimator and a DifferentialInfoEstimator
.
Estimation
- Example 1:
JointProbabilities
withValueBinning
outcome space. - Example 2:
JointProbabilities
withUniqueElements
outcome space on string data. - Example 3: Dedicated
GaussianMI
estimator. - Example 4: Dedicated
KraskovStögbauerGrassberger1
estimator. - Example 5: Dedicated
KraskovStögbauerGrassberger2
estimator. - Example 6: Dedicated
GaoKannanOhViswanath
estimator. - Example 7:
EntropyDecomposition
withKraskov
estimator. - Example 8:
EntropyDecomposition
withBubbleSortSwaps
. - Example 9:
EntropyDecomposition
withJackknife
estimator andValueBinning
outcome space. - Example 10: Reproducing Kraskov et al. (2004).
CausalityTools.MITsallisFuruichi
— TypeMITsallisFuruichi <: BivariateInformationMeasure
MITsallisFuruichi(; base = 2, q = 1.5)
The discrete Tsallis mutual information from Furuichi (2006)Furuichi2006, which in that paper is called the mutual entropy.
Usage
- Use with
association
to compute the raw Tsallis-Furuichi mutual information from input data using of of the estimators listed below. - Use with
independence
to perform a formal hypothesis test for pairwise dependence using the Tsallis-Furuichi mutual information.
Compatible estimators
Description
Furuichi's Tsallis mutual entropy between variables $X \in \mathbb{R}^{d_X}$ and $Y \in \mathbb{R}^{d_Y}$ is defined as
\[I_q^T(X; Y) = H_q^T(X) - H_q^T(X | Y) = H_q^T(X) + H_q^T(Y) - H_q^T(X, Y),\]
where $H^T(\cdot)$ and $H^T(\cdot, \cdot)$ are the marginal and joint Tsallis entropies, and q
is the Tsallis
-parameter.
Estimation
- Example 1:
JointProbabilities
withUniqueElements
outcome space. - Example 2:
EntropyDecomposition
withLeonenkoProzantoSavani
estimator. - Example 3:
EntropyDecomposition
withDispersion
CausalityTools.MITsallisMartin
— TypeMITsallisMartin <: BivariateInformationMeasure
MITsallisMartin(; base = 2, q = 1.5)
The discrete Tsallis mutual information from Martin2004.
Usage
- Use with
association
to compute the raw Tsallis-Martin mutual information from input data using of of the estimators listed below. - Use with
independence
to perform a formal hypothesis test for pairwise dependence using the Tsallis-Martin mutual information.
Compatible estimators
Description
Martin et al.'s Tsallis mutual information between variables $X \in \mathbb{R}^{d_X}$ and $Y \in \mathbb{R}^{d_Y}$ is defined as
\[I_{\text{Martin}}^T(X, Y, q) := H_q^T(X) + H_q^T(Y) - (1 - q) H_q^T(X) H_q^T(Y) - H_q(X, Y),\]
where $H^S(\cdot)$ and $H^S(\cdot, \cdot)$ are the marginal and joint Shannon entropies, and q
is the Tsallis
-parameter.
Estimation
- Example 1:
JointProbabilities
withUniqueElements
outcome space. - Example 2:
EntropyDecomposition
withLeonenkoProzantoSavani
estimator. - Example 3:
EntropyDecomposition
withOrdinalPatterns
outcome space.
CausalityTools.MMeasure
— TypeMMeasure <: ClosenessMeasure
MMeasure(; K::Int = 2, dx = 2, dy = 2, τx = - 1, τy = -1, w = 0)
The MMeasure
Andrzejak2003 is a pairwise association measure. It quantifies the probability with which close state of a target timeseries/embedding are mapped to close states of a source timeseries/embedding.
Note that τx
and τy
are negative by convention. See docstring for SMeasure
for an explanation.
Usage
- Use with
association
to compute the raw m-measure statistic. - Use with
independence
to perform a formal hypothesis test for directional dependence.
Description
The MMeasure
is based on SMeasure
and HMeasure
. It is given by
\[M^{(k)}(x|y) = \dfrac{1}{N} \sum_{i=1}^{N} \log \left( \dfrac{R_i(x) - R_i^{(k)}(x|y)}{R_i(x) - R_i^k(x)} \right),\]
where $R_i(x)$ is computed as for HMeasure
, while $R_i^k(x)$ and $R_i^{(k)}(x|y)$ is computed as for SMeasure
. Parameters also have the same meaning as for SMeasure
/HMeasure
.
See also: ClosenessMeasure
.
CausalityTools.MesnerShalizi
— TypeMesnerShalizi <: ConditionalMutualInformationEstimator
MesnerShalizi(definition = CMIShannon(); k = 1, w = 0)
The MesnerShalizi
ConditionalMutualInformationEstimator
is designed for data that can be mixtures of discrete and continuous data Mesner2020.
k
is the number of nearest neighbors. w
is the Theiler window, which controls the number of temporal neighbors that are excluded during neighbor searches.
Compatible definitions
Usage
- Use with
association
to computeCMIShannon
from input data. - Use with some
IndependenceTest
to test for independence between variables.
Examples
using CausalityTools
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000)
y = rand(rng, 10000) .+ x
z = rand(rng, 10000) .+ y
association(MesnerShalizi(; k = 10), x, z, y) # should be near 0 (and can be negative)
CausalityTools.MultivariateInformationMeasure
— TypeMultivariateInformationMeasure <: AssociationMeasure
The supertype for all multivariate information-based measure definitions.
Definition
Following Datseris2024, we define a multivariate information measure as any functional of a multidimensional probability mass functions (PMFs) or multidimensional probability density.
Implementations
JointEntropy
definitions:
ConditionalEntropy
definitions:
DivergenceOrDistance
definitions:
MutualInformation
definitions:
ConditionalMutualInformation
definitions:
TransferEntropy
definitions:
Other definitions:
CausalityTools.MultivariateInformationMeasureEstimator
— TypeMultivariateInformationMeasureEstimator
The supertype for all estimators of multivariate information measures.
Generic implementations
Dedicated implementations
KraskovStögbauerGrassberger1
KraskovStögbauerGrassberger2
GaoOhViswanath
GaoKannanOhViswanath
GaussianMI
CausalityTools.MutualInformation
— TypeMutualInformation
Abstract type for all mutual information measures.
Concrete implementations
See also: MutualInformationEstimator
CausalityTools.MutualInformationEstimator
— TypeMutualInformationEstimator
The supertype for dedicated MutualInformation
estimators.
Concrete implementations
CausalityTools.NeighborCloseness
— TypeNeighborCloseness <: LocalPermutationClosenessSearch
Determine closeness between points for LocalPermutationTest
using nearest neighbors searches.
CausalityTools.OCE
— TypeOCE <: GraphAlgorithm
OCE(; utest::IndependenceTest = SurrogateAssociationTest(MIShannon(), KSG2(k = 3, w = 3)),
ctest::C = LocalPermutationTest(CMIShannon(), MesnerShalizi(k = 3, w = 3)),
τmax::T = 1, α = 0.05)
The optimal causation entropy (OCE) algorithm for causal discovery Sun2015.
Description
The OCE algorithm has three steps to determine the parents of a variable xᵢ
.
- Perform pairwise independence tests using
utest
and select the variablexⱼ(-τ)
that has the highest significant (i.e. with associated p-value belowα
) association withxᵢ(0)
. Assign it to the set of selected parentsP
. - Perform conditional independence tests using
ctest
, finding the parentPₖ
that has the highest association withxᵢ
given the already selected parents, and add it toP
. Repeat until no more variables with significant association are found. - Backwards elimination of parents
Pₖ
ofxᵢ(0)
for whichxᵢ(0) ⫫ Pₖ | P - {Pₖ}
, whereP
is the set of parent nodes found in the previous steps.
τmax
indicates the maximum lag τ
between the target variable xᵢ(0)
and its potential parents xⱼ(-τ)
. Sun et al. 2015's method is based on τmax = 1
.
Returns
When used with infer_graph
, it returns a vector p
, where p[i]
are the parents for each input variable. This result can be converted to a SimpleDiGraph
from Graphs.jl (see example).
Usage
OCE
is used with infer_graph
to infer the parents of the input data. Input data must either be a Vector{Vector{<:Real}}
, or a StateSpaceSet
.
Examples
CausalityTools.OCESelectedParents
— TypeOCESelectedParents
A simple struct for storing the parents of a single variable xᵢ
inferred by the OCE
algorithm. When using OCE
with infer_graph
, a Vector{OCESelectedParents}
is returned - one per variable in the input data.
Assumptions and notation
Assumes the input x
is a Vector{Vector{<:Real}}
or a StateSpaceSet
(for which each column is treated as a variable). It contains the following fields, where we use the notation xₖ(τ)
to indicate the k
-th variable lagged by time-lag τ
. For example, x₂(-3)
is the variable x[2]
lagged by 3 time steps.
Fields
i
: The index of the target variable (i.e.xᵢ(0)
is the target).all_idxs
: The possible variable indices of parent variables (i.e.1:M
, whereM
is the number of input variables).parents_js
: The variable indices of the selected parent variables –- one per selected parent.parents_τs
: The lags for the selected parent variables –- one per selected parent.parents
: A vector containing the raw, time-lagged data for each selected parent variables. Letτ = parents_τs[k]
andj = parents_js[k]
. Thenparents[k]
is the raw data for the variablexⱼ(-τ)
.
CausalityTools.OptimiseTraditional
— TypeOptimiseTraditional(; η = 1, maxlag = 50, maxdim = 10,
method = delay_ifnn, dmethod = "mi_min")
Optimize embedding parameters using traditional delay embedding optimization methods with a maximum lag of maxlag
and a maximum dimension of maxdim
. method
can be either delay_ifnn
, delay_fnn
or delay_f1nn
.
CausalityTools.PC
— TypePC <: GraphAlgorithm
PC(pairwise_test, conditional_test;
α = 0.05, max_depth = Inf, maxiters_orient = Inf)
The PC algorithm Spirtes2000, which is named named after the first names of the authors, Peter Spirtes and Clark Glymour, which is implemented as described in Kalisch2008.
Arguments
pairwise_test
: AnIndependenceTest
that uses a pairwise, nondirectionalAssociationMeasure
measure (e.g. a parametricCorrTest
, orSurrogateAssociationTest
with theMIShannon
measure).conditional_test
: AnIndependenceTest
that uses a conditional, nondirectionalAssociationMeasure
(e.g.CorrTest
, orSurrogateAssociationTest
with theCMIShannon
measure).
Keyword arguments
α::Real
. The significance level of the test.max_depth
. The maximum level of conditional indendence tests to be performed. By default, there is no limit (i.e.max_depth = Inf
), meaning that maximum depth isN - 2
, whereN
is the number of input variables.maxiters_orient::Real
. The maximum number of times to apply the orientation rules. By default, there is not limit (i.e.maxiters_orient = Inf
).
During the skeleton search phase, if a significance association between two nodes are is found, then a bidirectional edge is drawn between them. The generic implementation of PC
therefore doesn't currently handle directional measures such as TEShannon
. The reason is that if a directional relationship X → Y
exists between two nodes X
and Y
, then the algorithm would first draw a bidirectional arrow between X
and Y
when analysing the direction X → Y
, and then removing it again when analysing in the direction Y → X
(a similar situation would also occur for the conditional stage). This will be fixed in a future release. For now, use nondirectional measures, e.g. MIShannon
and CMIShannon
!
Description
When used with infer_graph
on some input data x
, the PC
algorithm performs the following steps:
- Initialize an empty fully connected graph
g
withN
nodes, whereN
is the number of variables andx[i]
is the data for thei
-th node. - Reduce the fully connected
g
to a skeleton graph by performing pairwiseindependence
tests between all vertices usingpairwise_test
. Remove any edges where adjacent vertices are found to be independent according to the test (i.e. the null hypothesis of independence cannot be rejected at significance level1 - α
). - Thin the skeleton
g
by conditionalindependence
testing. Ifx[i] ⫫ x[j] | x[Z]
for some set of variablesZ
(not includingi
andj
) according toconditional_test
(i.e. the null hypothesis of conditional independence cannot be rejected at significance level1 - α
), then the edge betweeni
andj
is removed, and we record the separating set S(i, j) = Z. Independence tests are first performed for conditioning sets of size 1, and repeated for conditioning sets of increasing size, which in most cases limits the number of tests needed. The separating setsS(i, j)
, which records which variables were in the conditioning set that rendered variablesi
andj
independent, are recorded. Ifmax_depth
is an integer, then this procedure is performed on conditioning sets of sizes1:max_depth
, and ifmax_depth == nothing
, then all possible conditioning set sizes are potentially used. - Create a directed graph
dg
fromg
by replacing every undirected edgeX - Y
ing
by the bidirectional edgeX ↔ Y
(i.e. construct two directional edgesX → Y
andY → X
). Orientiation rules 0-3 are then repeatedly applied todg
until no more edges can be oriented:- Rule 0 (orients v-structures):
X ↔ Y ↔ Z
becomesX → Y ← Z
ifY
is not in the separating setS(X, Z)
. - Rule 1 (prevents new v-structures):
X → Y ↔ Z
becomesX → Y → Z
ifX
andZ
are not adjacent. - Rule 2 (avoids cycles):
X → Y → Z ↔ X
becomesX → Y → Z ← X
. - Rule 3: To avoid creating cycles or new v-structures, whenever
X - Y → Z
,X - W → Z
, andX - Z
but there is no edge betweenY
andW
, turn the undirectedX - Z
edge into the directed edgeX → Z
.
- Rule 0 (orients v-structures):
The resulting directed graph (a SimpleDiGraph
from Graphs.jl) is then returned.
Examples
CausalityTools.PairwiseAsymmetricInference
— TypePairwiseAsymmetricInference <: CrossmapMeasure
PairwiseAsymmetricInference(; d::Int = 2, τ::Int = -1, w::Int = 0,
f = Statistics.cor, embed_warn = true)
The pairwise asymmetric inference (PAI) measure McCracken2014 is a version of ConvergentCrossMapping
that searches for neighbors in mixed embeddings (i.e. both source and target variables included); otherwise, the algorithms are identical.
Usage
- Use with
association
to compute the pairwise asymmetric inference measure between variables.
Compatible estimators
Description
The Theiler window w
controls how many temporal neighbors are excluded during neighbor searches (w = 0
means that only the point itself is excluded). f
is a function that computes the agreement between observations and predictions (the default, f = Statistics.cor
, gives the Pearson correlation coefficient).
Embedding
There are many possible ways of defining the embedding for PAI. Currently, we only implement the "add one non-lagged source timeseries to an embedding of the target" approach, which is used as an example in McCracken & Weigel's paper. Specifically: Let S(i)
be the source time series variable and T(i)
be the target time series variable. PairwiseAsymmetricInference
produces regular embeddings with fixed dimension d
and embedding lag τ
as follows:
\[(S(i), T(i+(d-1)\tau, \ldots, T(i+2\tau), T(i+\tau), T(i)))_{i=1}^{N-(d-1)\tau}.\]
In this joint embedding, neighbor searches are performed in the subspace spanned by the first D
variables, while the last variable is to be predicted.
With this convention, τ < 0
implies "past/present values of source used to predict target", and τ > 0
implies "future/present values of source used to predict target". The latter case may not be meaningful for many applications, so by default, a warning will be given if τ > 0
(embed_warn = false
turns off warnings).
Estimation
- Example 1. Estimation with
RandomVectors
estimator. - Example 2. Estimation with
RandomSegment
estimator. - Example 3. Reproducing McCracken & Weigel's results from the original paper.
CausalityTools.PartialCorrelation
— TypePartialCorrelation <: AssociationMeasure
The correlation of two variables, with the effect of a set of conditioning variables removed.
Usage
- Use with
association
to compute the raw partial correlation coefficient. - Use with
independence
to perform a formal hypothesis test for correlated-based conditional independence.
Description
There are several ways of estimating the partial correlation. We follow the matrix inversion method, because for StateSpaceSet
s, we can very efficiently compute the required joint covariance matrix $\Sigma$ for the random variables.
Formally, let $X_1, X_2, \ldots, X_n$ be a set of $n$ real-valued random variables. Consider the joint precision matrix,$P = (p_{ij}) = \Sigma^-1$. The partial correlation of any pair of variables $(X_i, X_j)$, given the remaining variables $\bf{Z} = \{X_k\}_{i=1, i \neq i, j}^n$, is defined as
\[\rho_{X_i X_j | \bf{Z}} = -\dfrac{p_ij}{\sqrt{ p_{ii} p_{jj} }}\]
In practice, we compute the estimate
\[\hat{\rho}_{X_i X_j | \bf{Z}} = -\dfrac{\hat{p}_ij}{\sqrt{ \hat{p}_{ii} \hat{p}_{jj} }},\]
where $\hat{P} = \hat{\Sigma}^{-1}$ is the sample precision matrix.
CausalityTools.PartialMutualInformation
— TypePartialMutualInformation <: MultivariateInformationMeasure
PartialMutualInformation(; base = 2)
The partial mutual information (PMI) measure of conditional association Zhao2016.
Definition
PMI is defined as for variables $X$, $Y$ and $Z$ as
\[PMI(X; Y | Z) = D(p(x, y, z) || p^{*}(x|z) p^{*}(y|z) p(z)),\]
where $p(x, y, z)$ is the joint distribution for $X$, $Y$ and $Z$, and $D(\cdot, \cdot)$ is the extended Kullback-Leibler divergence from $p(x, y, z)$ to $p^{*}(x|z) p^{*}(y|z) p(z)$. See Zhao2016 for details.
Estimation
The PMI is estimated by first estimating a 3D probability mass function using probabilities
, then computing $PMI(X; Y | Z)$ from those probaiblities.
Properties
For the discrete case, the following identities hold in theory (when estimating PMI, they may not).
PMI(X, Y, Z) >= CMI(X, Y, Z)
(where CMI is the Shannon CMI). Holds in theory, but when estimating PMI, the identity may not hold.PMI(X, Y, Z) >= 0
. Holds both in theory and when estimating using discrete estimators.X ⫫ Y | Z => PMI(X, Y, Z) = CMI(X, Y, Z) = 0
(in theory, but not necessarily for estimation).
CausalityTools.PearsonCorrelation
— TypePearsonCorrelation
The Pearson correlation of two variables.
Usage
- Use with
association
to compute the raw Pearson correlation coefficient. - Use with
independence
to perform a formal hypothesis test for pairwise dependence using the Pearson correlation coefficient.
Description
The sample Pearson correlation coefficient for real-valued random variables $X$ and $Y$ with associated samples $\{x_i\}_{i=1}^N$ and $\{y_i\}_{i=1}^N$ is defined as
\[\rho_{xy} = \dfrac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y}) }{\sqrt{\sum_{i=1}^N (x_i - \bar{x})^2}\sqrt{\sum_{i=1}^N (y_i - \bar{y})^2}},\]
where $\bar{x}$ and $\bar{y}$ are the means of the observations $x_k$ and $y_k$, respectively.
CausalityTools.Phase
— TypePhase <: InstantaneousSignalProperty
Indicates that the instantaneous phases of a signal should be used.
CausalityTools.PoczosSchneiderCMI
— TypePoczosSchneiderCMI <: ConditionalMutualInformationEstimator
PoczosSchneiderCMI(definition = CMIRenyiPoczos(); k = 1, w = 0)
The PoczosSchneiderCMI
ConditionalMutualInformationEstimator
computes conditional mutual informations using a k
-th nearest neighbor approach Poczos2012.
k
is the number of nearest neighbors. w
is the Theiler window, which controls the number of temporal neighbors that are excluded during neighbor searches.
Compatible definitions
Usage
- Use with
association
to computeCMIRenyiPoczos
from input data. - Use with some
IndependenceTest
to test for independence between variables.
Examples
using CausalityTools
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000)
y = rand(rng, 10000) .+ x
z = rand(rng, 10000) .+ y
association(PoczosSchneiderCMI(CMIRenyiPoczos(), k = 10), x, z, y) # should be near 0 (and can be negative)
CausalityTools.RMCD
— TypeRMCD <: AssociationMeasure
RMCD(; r, metric = Euclidean(), base = 2)
The recurrence measure of conditional dependence, or RMCD Ramos2017, is a recurrence-based measure that mimics the conditional mutual information, but uses recurrence probabilities.
Usage
- Use with
association
to compute the raw RMCD for pairwise or conditional association. - Use with
IndependenceTest
to perform a formal hypothesis test for pairwise or conditional association.
Description
r
is a mandatory keyword which specifies the recurrence threshold when constructing recurrence matrices. It can be instance of any subtype of AbstractRecurrenceType
from RecurrenceAnalysis.jl. To use any r
that is not a real number, you have to do using RecurrenceAnalysis
first. The metric
is any valid metric from Distances.jl.
Both the pairwise and conditional RMCD is non-negative, but due to round-off error, negative values may occur. If that happens, an RMCD value of 0.0
is returned.
Description
The RMCD measure is defined by
\[I_{RMCD}(X; Y | Z) = \dfrac{1}{N} \sum_{i} \left[ \dfrac{1}{N} \sum_{j} R_{ij}^{X, Y, Z} \log \left( \dfrac{\sum_{j} R_{ij}^{X, Y, Z} \sum_{j} R_{ij}^{Z} }{\sum_{j} \sum_{j} R_{ij}^{X, Z} \sum_{j} \sum_{j} R_{ij}^{Y, Z}} \right) \right],\]
where base
controls the base of the logarithm. $I_{RMCD}(X; Y | Z)$ is zero when $Z = X$, $Z = Y$ or when $X$, $Y$ and $Z$ are mutually independent.
Our implementation allows dropping the third/last argument, in which case the following mutual information-like quantitity is computed (not discussed in Ramos2017.
\[I_{RMCD}(X; Y) = \dfrac{1}{N} \sum_{i} \left[ \dfrac{1}{N} \sum_{j} R_{ij}^{X, Y} \log \left( \dfrac{\sum_{j} R_{ij}^{X} R_{ij}^{Y} }{\sum_{j} R_{ij}^{X, Y}} \right) \right]\]
Estimation
- Example 1. Pairwise versus conditional RMCD.
CausalityTools.Rahimzamani
— TypeRahimzamani <: ConditionalMutualInformationEstimator
Rahimzamani(k = 1, w = 0)
The Rahimzamani
ConditionalMutualInformationEstimator
is designed for data that can be mixtures of discrete and continuous data Rahimzamani2018.
Compatible definitions
Usage
- Use with
association
to compute aCMIShannon
from input data. - Use with some
IndependenceTest
to test for independence between variables.
Description
This estimator is very similar to the GaoKannanOhViswanath
mutual information estimator, but has been expanded to the conditional mutual information case.
k
is the number of nearest neighbors. w
is the Theiler window, which controls the number of temporal neighbors that are excluded during neighbor searches.
Examples
using CausalityTools
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000)
y = rand(rng, 10000) .+ x
z = rand(rng, 10000) .+ y
association(Rahimzamani(; k = 10), x, z, y) # should be near 0 (and can be negative)
CausalityTools.RandomSegment
— TypeRandomSegment <: CrossmapEstimator
RandomSegment(definition::CrossmapMeasure; libsizes::Int, rng = Random.default_rng())
Cross map once over N = length(libsizes)
different "point libraries", where point indices are selected as time-contiguous segments with random starting points.
This is method 2 from Luo2015. See CrossmapEstimator
for an in-depth explanation of what "library" means in this context.
Description
The cardinality of the point index segments are given by libsizes
. One segment with a randomly selected starting point is picked per L ∈ libsizes
, and the i
-th point index segment has cardinality k = libsizes[i]
.
The starting point for each library is selected independently of other libraries. A user-specified rng
may be specified for reproducibility. If the time series you're cross mapping between have length M
, and Lᵢ < M
for any Lᵢ ∈ libsizes
, then an error will be thrown.
A user-specified rng
may be specified for reproducibility.
Returns
The return type when used with association
depends on the type of libsizes
.
- If
libsizes
is anInt
(a single library), then a single cross-map estimate is returned. - If
libsizes
is anAbstractVector{Int}
(multiple libraries), then a vector of cross-map estimates is returned –- one per library.
See also: CrossmapEstimator
.
CausalityTools.RandomVectors
— TypeRandomVectors <: CrossmapEstimator
RandomVectors(definition::CrossmapMeasure; libsizes, replace = false,
rng = Random.default_rng())
Cross map once over N = length(libsizes)
different "point libraries", where point indices are selected randomly (not considering time ordering).
This is method 3 from Luo2015. See CrossmapEstimator
for an in-depth explanation of what "library" means in this context.
Description
The cardinality of the point libraries are given by libsizes
. One set of random point indices is selected per L ∈ libsizes
, and the i
-th library has cardinality k = libsizes[i]
.
Point indices within each library are randomly selected, independently of other libraries. A user-specified rng
may be specified for reproducibility. The replace
argument controls whether sampling is done with or without replacement. If the time series you're cross mapping between have length M
, and Lᵢ < M
for any Lᵢ ∈ libsizes
, then you must set replace = true
.
Returns
The return type when used with association
depends on the type of libsizes
.
- If
libsizes
is anInt
(a single library), then a single cross-map estimate is returned. - If
libsizes
is anAbstractVector{Int}
(multiple libraries), then a vector of cross-map estimates is returned –- one per library.
See also: CrossmapEstimator
.
CausalityTools.RenyiDivergence
— TypeRenyiDivergence <: DivergenceOrDistance
RenyiDivergence(q; base = 2)
The Rényi divergence of positive order q
.
Usage
- Use with
association
to compute the compute the Rényi divergence between two pre-computed probability distributions, or from raw data using one of the estimators listed below.
Compatible estimators
Description
The Rényi divergence between two probability distributions $P_X = (p_x(\omega_1), \ldots, p_x(\omega_n))$ and $P_Y = (p_y(\omega_1), \ldots, p_y(\omega_m))$, both defined over the same OutcomeSpace
$\Omega = \{\omega_1, \ldots, \omega_n \}$, is defined as vanErven2014.
\[D_{q}(P_Y(\Omega) || P_Y(\Omega)) = \dfrac{1}{q - 1} \log \sum_{\omega \in \Omega}p_x(\omega)^{q}p_y(\omega)^{1-\alpha}\]
Implements
information
. Used to compute the Rényi divergence between two pre-computed probability distributions. If used withRelativeAmount
, the KL divergence may be undefined to due some outcomes having zero counts. Use some otherProbabilitiesEstimator
likeBayesianRegularization
to ensure all estimated probabilities are nonzero.
Distances.jl also defines RenyiDivergence
. Quality it if you're loading both packages, i.e. do association(CausalityTools.RenyiDivergence(), x, y)
.
Estimation
- Example 1: From precomputed probabilities
- Example 2:
JointProbabilities
withOrdinalPatterns
outcome space
CausalityTools.SMeasure
— TypeSMeasure < ClosenessMeasure
SMeasure(; K::Int = 2, dx = 2, dy = 2, τx = - 1, τy = -1, w = 0)
SMeasure
is a bivariate association measure from Arnhold1999 and Quiroga2000 that measure directional dependence between two input (potentially multivariate) time series.
Note that τx
and τy
are negative; see explanation below.
Usage
- Use with
association
to compute the raw s-measure statistic. - Use with
independence
to perform a formal hypothesis test for directional dependence.
Description
The steps of the algorithm are:
- From input time series $x(t)$ and $y(t)$, construct the delay embeddings (note the positive sign in the embedding lags; therefore inputs parameters
τx
andτy
are by convention negative).
\[\begin{align*} \{\bf{x}_i \} &= \{(x_i, x_{i+\tau_x}, \ldots, x_{i+(d_x - 1)\tau_x}) \} \\ \{\bf{y}_i \} &= \{(y_i, y_{i+\tau_y}, \ldots, y_{i+(d_y - 1)\tau_y}) \} \\ \end{align*}\]
Let $r_{i,j}$ and $s_{i,j}$ be the indices of the
K
-th nearest neighbors of $\bf{x}_i$ and $\bf{y}_i$, respectively. Neighbors closed thanw
time indices are excluded during searches (i.e.w
is the Theiler window).Compute the the mean squared Euclidean distance to the $K$ nearest neighbors for each $x_i$, using the indices $r_{i, j}$.
\[R_i^{(k)}(x) = \dfrac{1}{k} \sum_{i=1}^{k}(\bf{x}_i, \bf{x}_{r_{i,j}})^2\]
- Compute the y-conditioned mean squared Euclidean distance to the $K$ nearest neighbors for each $x_i$, now using the indices $s_{i,j}$.
\[R_i^{(k)}(x|y) = \dfrac{1}{k} \sum_{i=1}^{k}(\bf{x}_i, \bf{x}_{s_{i,j}})^2\]
- Define the following measure of independence, where $0 \leq S \leq 1$, and low values indicate independence and values close to one occur for synchronized signals.
\[S^{(k)}(x|y) = \dfrac{1}{N} \sum_{i=1}^{N} \dfrac{R_i^{(k)}(x)}{R_i^{(k)}(x|y)}\]
Input data
The algorithm is slightly modified from Arnhold1999 to allow univariate timeseries as input.
- If
x
andy
areStateSpaceSet
s then usex
andy
as is and ignore the parametersdx
/τx
anddy
/τy
. - If
x
andy
are scalar time series, then createdx
anddy
dimensional embeddings, respectively, of bothx
andy
, resulting inN
differentm
-dimensional embedding points $X = \{x_1, x_2, \ldots, x_N \}$ and $Y = \{y_1, y_2, \ldots, y_N \}$.τx
andτy
control the embedding lags forx
andy
. - If
x
is a scalar-valued vector andy
is aStateSpaceSet
, or vice versa, then create an embedding of the scalar timeseries using parametersdx
/τx
ordy
/τy
.
In all three cases, input StateSpaceSets are length-matched by eliminating points at the end of the longest StateSpaceSet (after the embedding step, if relevant) before analysis.
See also: ClosenessMeasure
.
CausalityTools.SurrogateAssociationTest
— TypeSurrogateAssociationTest <: IndependenceTest
SurrogateAssociationTest(est_or_measure;
nshuffles::Int = 100,
surrogate = RandomShuffle(),
rng = Random.default_rng(),
show_progress = false,
)
A surrogate-data based generic (conditional) independence test for assessing whether the association between variables X
and Y
are independent, potentially conditioned on a third variable Z
.
Compatible estimators and measures
- Compatible with
AssociationMeasure
s that measure some sort of pairwise or conditional association.
You must yourself determine whether using a particular measure is meaningful, and what it means.
If used with a TransferEntropy
measure such as TEShannon
, then the source variable is always shuffled, and the target and conditional variable are left unshuffled.
Usage
- Use with
independence
to perform a surrogate test with input data. This will return aSurrogateAssociationTestResult
.
Description
This is a generic one-sided hypothesis test that checks whether x
and y
are independent (given z
, if provided) based on resampling from a null distribution assumed to represent independence between the variables. The null distribution is generated by repeatedly shuffling the input data in some way that is intended to break any dependence between the input variables.
The test first estimates the desired statistic using est_or_measure
on the input data. Then, the first input variable is shuffled nshuffled
times according to the given surrogate
method (each type of surrogate
represents a distinct null hypothesis). For each shuffle, est_or_measure
is recomputed and the results are stored.
Examples
- Example 1:
SMeasure
test for pairwise independence. - Example 2:
DistanceCorrelation
test for pairwise independence. - Example 3:
PartialCorrelation
test for conditional independence. - Example 4:
MIShannon
test for pairwise independence on categorical data. - Example 5:
CMIShannon
test for conditional independence on categorical data. - Example 6:
MCR
test for pairwise and conditional independence.
CausalityTools.SurrogateAssociationTestResult
— TypeSurrogateAssociationTestResult(m, m_surr, pvalue)
Holds the result of a SurrogateAssociationTest
. m
is the measure computed on the original data. m_surr
is a vector of the measure computed on permuted data, where m_surr[i]
is the measure compute on the i
-th permutation. pvalue
is the one-sided p
-value for the test.
CausalityTools.SymbolicTransferEntropy
— TypeSymbolicTransferEntropy <: TransferEntropyEstimator
SymbolicTransferEntropy(definition = TEShannon(); m = 3, τ = 1,
lt = ComplexityMeasures.isless_rand
A convenience estimator for symbolic transfer entropy Staniek2008.
Compatible measures
Description
Symbolic transfer entropy consists of two simple steps. First, the input time series are encoded using codify
with the CodifyVariables
discretization and the OrdinalPatterns
outcome space. This transforms the input time series into integer time series. Transfer entropy entropy is then estimated from the encoded time series by applying
Transfer entropy is then estimated as usual on the encoded timeseries with the embedding dictated by definition
and the JointProbabilities
estimator.
Examples
CausalityTools.TERenyiJizba
— TypeTERenyiJizba() <: TransferEntropy
The Rényi transfer entropy from Jizba2012.
Usage
- Use with
association
to compute the raw transfer entropy. - Use with an
IndependenceTest
to perform a formal hypothesis test for pairwise and conditional dependence.
Description
The transfer entropy from source $S$ to target $T$, potentially conditioned on $C$ is defined as
\[\begin{align*} TE(S \to T) &:= I_q^{R_J}(T^+; S^- | T^-) \\ TE(S \to T | C) &:= I_q^{R_J}(T^+; S^- | T^-, C^-), \end{align*},\]
where $I_q^{R_J}(T^+; S^- | T^-)$ is Jizba et al. (2012)'s definition of conditional mutual information (CMIRenyiJizba
). The -
and +
subscripts on the marginal variables $T^+$, $T^-$, $S^-$ and $C^-$ indicate that the embedding vectors for that marginal are constructed using present/past values and future values, respectively.
Estimation
Estimating Jizba's Rényi transfer entropy is a bit complicated, since it doesn't have a dedicated estimator. Instead, we re-write the Rényi transfer entropy as a Rényi conditional mutual information, and estimate it using an EntropyDecomposition
with a suitable discrete/differential Rényi entropy estimator from the list below as its input.
Estimator | Sub-estimator | Principle |
---|---|---|
EntropyDecomposition | LeonenkoProzantoSavani | Four-entropies decomposition |
EntropyDecomposition | ValueBinning | Four-entropies decomposition |
EntropyDecomposition | Dispersion | Four-entropies decomposition |
EntropyDecomposition | OrdinalPatterns | Four-entropies decomposition |
EntropyDecomposition | UniqueElements | Four-entropies decomposition |
EntropyDecomposition | TransferOperator | Four-entropies decomposition |
Any of these estimators must be given as input to a `CMIDecomposition estimator.
Estimation
- Example 1:
EntropyDecomposition
withTransferOperator
outcome space.
CausalityTools.TEShannon
— TypeTEShannon <: TransferEntropy
TEShannon(; base = 2; embedding = EmbeddingTE()) <: TransferEntropy
The Shannon-type transfer entropy measure.
Usage
- Use with
association
to compute the raw transfer entropy. - Use with an
IndependenceTest
to perform a formal hypothesis test for pairwise and conditional dependence.
Description
The transfer entropy from source $S$ to target $T$, potentially conditioned on $C$ is defined as
\[\begin{align*} TE(S \to T) &:= I^S(T^+; S^- | T^-) \\ TE(S \to T | C) &:= I^S(T^+; S^- | T^-, C^-) \end{align*}\]
where $I(T^+; S^- | T^-)$ is the Shannon conditional mutual information (CMIShannon
). The -
and +
subscripts on the marginal variables $T^+$, $T^-$, $S^-$ and $C^-$ indicate that the embedding vectors for that marginal are constructed using present/past values and future values, respectively.
Estimation
- Example 1:
EntropyDecomposition
withTransferOperator
outcome space. - Example 2: Estimation using the
SymbolicTransferEntropy
estimator.
CausalityTools.TEVars
— TypeTEVars(Tf::Vector{Int}, T::Vector{Int}, S::Vector{Int})
TEVars(Tf::Vector{Int}, T::Vector{Int}, S::Vector{Int}, C::Vector{Int})
TEVars(;Tf = Int[], T = Int[], S = Int[], C = Int[]) -> TEVars
Which axes of the state space correspond to the future of the target (Tf
), the present/past of the target (T
), the present/past of the source (S
), and the present/past of any conditioned variables (C
)? This information is used by the transfer entropy estimators to ensure that marginal distributions are computed correctly.
Indices correspond to variables of the embedding, or, equivalently, colums of a StateSpaceSet
.
- The three-argument constructor assumes there will be no conditional variables.
- The four-argument constructor assumes there will be conditional variables.
CausalityTools.TransferEntropy
— TypeTransferEntropy <: AssociationMeasure
The supertype of all transfer entropy measures. Concrete subtypes are
CausalityTools.TransferEntropyEstimator
— TypeThe supertype of all dedicated transfer entropy estimators.
CausalityTools.VariationDistance
— TypeVariationDistance <: DivergenceOrDistance
The variation distance.
Usage
- Use with
association
to compute the compute the variation distance between two pre-computed probability distributions, or from raw data using one of the estimators listed below.
Compatible estimators
Description
The variation distance between two probability distributions $P_X = (p_x(\omega_1), \ldots, p_x(\omega_n))$ and $P_Y = (p_y(\omega_1), \ldots, p_y(\omega_m))$, both defined over the same OutcomeSpace
$\Omega = \{\omega_1, \ldots, \omega_n \}$, is defined as
\[D_{V}(P_Y(\Omega) || P_Y(\Omega)) = \dfrac{1}{2} \sum_{\omega \in \Omega} | p_x(\omega) - p_y(\omega) |\]
Examples
- Example 1: From precomputed probabilities
- Example 2:
JointProbabilities
withOrdinalPatterns
outcome space
CausalityTools.Zhu1
— TypeZhu1 <: TransferEntropyEstimator
Zhu1(k = 1, w = 0, base = MathConstants.e)
The Zhu1
transfer entropy estimator Zhu2015 for normalized input data (as described in Zhu2015) for both for pairwise and conditional transfer entropy.
Compatible definitions
Usage
- Use with
association
to computeTEShannon
from input data. - Use with some
IndependenceTest
to test for independence between variables.
Description
This estimator approximates probabilities within hyperrectangles surrounding each point xᵢ ∈ x
using using k
nearest neighbor searches. However, it also considers the number of neighbors falling on the borders of these hyperrectangles. This estimator is an extension to the entropy estimator in Singh2003.
w
is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to 0
, meaning that only the point itself is excluded when searching for neighbours).
Description
For a given points in the joint embedding space jᵢ
, this estimator first computes the distance dᵢ
from jᵢ
to its k
-th nearest neighbor. Then, for each point mₖ[i]
in the k
-th marginal space, it counts the number of points within radius dᵢ
.
The Shannon transfer entropy is then computed as
\[TE_S(X \to Y) = \psi(k) + \dfrac{1}{N} \sum_{i}^n \left[ \sum_{k=1}^3 \left( \psi(m_k[i] + 1) \right) \right],\]
where the index k
references the three marginal subspaces T
, TTf
and ST
for which neighbor searches are performed. Here this estimator has been modified to allow for conditioning too (a simple modification to Lindner2011's equation 5 and 6).
Example
using CausalityTools
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000)
y = rand(rng, 10000) .+ x
z = rand(rng, 10000) .+ y
est = Zhu1(TEShannon(), k = 10)
association(est, x, z, y) # should be near 0 (and can be negative)
CausalityTools._convert_logunit
— Method_convert_logunit(h_a::Real, , to) → h_b
Convert a number h_a
computed with logarithms to base a
to an entropy h_b
computed with logarithms to base b
. This can be used to convert the "unit" of e.g. an entropy.
CausalityTools.association
— Methodassociation(estimator::AssociationMeasureEstimator, x, y, [z, ...]) → r
association(definition::AssociationMeasure, x, y, [z, ...]) → r
Estimate the (conditional) association between input variables x, y, z, …
using the given estimator
(an AssociationMeasureEstimator
) or definition
(an AssociationMeasure
).
The type of the return value r
depends on the measure
/estimator
. The interpretation of the returned value also depends on the specific measure and estimator used.
Examples
The examples section of the online documentation has numerous using association
.
CausalityTools.backwards_eliminate!
— Methodbackwards_eliminate!(alg::OCE, parents::OCESelectedParents, x, i; verbose)
Algorithm 2.2 in Sun et al. (2015). Perform backward elimination for the i
-th variable in x
, given the previously inferred parents
, which were deduced using parameters in alg
. Modifies parents
in-place.
CausalityTools.codified_marginals
— Functioncodified_marginals(o::OutcomeSpace, x::VectorOrStateSpaceSet...)
Encode/discretize each input vector (e.g. timeseries) xᵢ ∈ x
according to a procedure determined by o
.
For some outcome spaces, the encoding is sequential (i.e. time ordering matters). Any xᵢ ∈ X
that are multidimensional (StateSpaceSet
s) will be encoded column-wise, i.e. each column of xᵢ
is treated as a timeseries and is encoded separately.
This is useful for discretizing input data when computing some MultivariateInformationMeasure
. This method is used internally by both the JointProbabilities
and EntropyDecomposition
estimators to handle discretization.
Supported estimators
ValueBinning
. Bin visitation frequencies are counted in the joint spaceXY
, then marginal visitations are obtained from the joint bin visits. This behaviour is the same for bothFixedRectangularBinning
andRectangularBinning
(which adapts the grid to the data). When usingFixedRectangularBinning
, the range along the first dimension is used as a template for all other dimensions.OrdinalPatterns
. Each timeseries is separatelycodify
-ed by embedding the timeseries, then sequentially encoding the ordinal patterns of the embedding vectors.Dispersion
. Each timeseries is separatelycodify
-ed by embedding the timeseries, then sequentially encoding the embedding vectors according to their dispersion pattern (which for each embedding vector is computed relative to all other embedding vectors).CosineSimilarityBinning
. Each timeseries is separatelycodify
-ed by embedding the timeseries, the encoding the embedding points in a in a sequential manner according to the cosine similarity of the embedding vectors.UniqueElements
. Each timeseries iscodify
-ed according to its unique values (i.e. each unique element gets assigned a specific integer).
More implementations are possible.
CausalityTools.cpdag
— Methodcpdag(alg::PC, skeleton::SimpleDiGraph, separating_sets::Dict{Edge, Vector{Int}}) → dg::SimpleDiGraph
Orient edges in the skeleton
graph using the given separating_sets
using algorithm 2 in Kalisch2008 and return the directed graph cpdag
.
Description
First, a directed graph dg
is constructed from skeleton
, such that every undirected edge X - Y
in skeleton
is replaced by the bidirectional edge X ↔ Y
in dg
. In practices, for each X ↔ Y
, we construct two directional edges X → Y
and Y → X
.
Orientiation rules 0-3 are then applied to dg
. We use the rules as stated in Colombo & Maathuis, 2014.
- Rule 0 (orients v-structures):
X ↔ Y ↔ Z
becomesX → Y ← Z
ifY
is not in the separating setS(X, Z)
. - Rule 1 (prevents new v-structures):
X → Y ↔ Z
becomesX → Y → Z
ifX
andZ
are not adjacent. - Rule 2 (avoids cycles):
X → Y → Z ↔ X
becomes X → Y → Z ← X` - Rule 3: To avoid creating cycles or new v-structures, X X | | | | | | Y | W becomes Y | W ↘ | ↙ ↘ ↓ ↙ Z Z
CausalityTools.crossmap
— Functioncrossmap(measure::CrossmapEstimator, t::AbstractVector, s::AbstractVector) → ρ::Real
crossmap(measure::CrossmapEstimator, est, t::AbstractVector, s::AbstractVector) → ρ::Vector
crossmap(measure::CrossmapEstimator, t̄::AbstractVector, S̄::AbstractStateSpaceSet) → ρ
Compute the cross map estimates between between raw time series t
and s
(and return the real-valued cross-map statistic ρ
). If a CrossmapEstimator
est
is provided, cross mapping is done on random subsamples of the data, where subsampling is dictated by est
(a vector of values for ρ
is returned).
Alternatively, cross-map between time-aligned time series t̄
and source embedding S̄
that have been constructed by jointly (pre-embedding) some input data.
This is just a wrapper around predict
that simply returns the correspondence measure between the source and the target.
CausalityTools.distance_covariance
— Methoddistance_covariance(x, y) → dcov::Real
distance_covariance(x, y, z) → pdcov::Real
Compute the empirical/sample distance covariance (Székely et al., 2007)[Székely2007] between StateSpaceSets x
and y
. Alternatively, compute the partial distance covariance pdcov
.
CausalityTools.distance_variance
— Methoddistance_variance(x) → dvar::Real
Compute the empirical/sample distance variance (Székely et al., 2007)[Székely2007] for StateSpaceSet x
.
CausalityTools.eliminate_loop!
— Methodeliminate_loop!(alg::OCE, parents::OCESelectedParents, xᵢ; verbose = false)
Inner portion of algorithm 2.2 in Sun et al. (2015). This method is called in an external while-loop that handles the variable elimination step in their line 3.
CausalityTools.estimate_from_marginals
— Functionestimate_from_marginals(est::TransferEntropyEstimator,
S::AbstractStateSpaceSet,
T::AbstractStateSpaceSet,
T⁺::AbstractStateSpaceSet,
C::AbstractStateSpaceSet)
Convenience method for TransferEntropyEstimator
s that allows easier integration with LocalPermutationTest
. Zhu1
and [Lindner
])@ref) uses this method.
CausalityTools.estimator_with_overridden_parameters
— Methodestimator_with_overridden_parameters(definition, lower_level_estimator) → e::typeof(lower_level_estimator)
Given some higher-level definition
of an information measure, which is to be estimated using some lower_level_estimator
, return a modified version of the estimator in which its parameter have been overriden by any overlapping parameters from the defintiion
.
This method is explicitly extended for each possible decomposition.
CausalityTools.fishers_z
— Methodfishers_z(p̂)
Compute Fisher's z-transform on the sample partial correlation coefficient p̂
(computed as the correlation between variables i
and j
, given the remaining variables):
\[Z(V_i, V_j | \bf{S}) = \dfrac{1}{2} \log{\left( \dfrac{1 + \hat{p}(V_i, V_j | \bf{S})}{1 - \hat{p}(V_i, V_j | \bf{S})} \right) }\]
CausalityTools.independence
— Methodindependence(test::IndependenceTest, x, y, [z]) → summary
Perform the given IndependenceTest
test
on data x
, y
and z
. If only x
and y
are given, test
must provide a bivariate association measure. If z
is given too, then test
must provide a conditional association measure.
Returns a test summary
, whose type depends on test
.
Compatible independence tests
CausalityTools.infer_graph
— Functioninfer_graph(algorithm::GraphAlgorithm, x) → g
Infer graph from input data x
using the given algorithm
.
Returns g
, whose type depends on algorithm
.
CausalityTools.jdd
— Methodjdd(measure::JointDistanceDistribution, source, target) → Δ
Compute the joint distance distribution Amigo2018 from source
to target
using the given JointDistanceDistribution
measure.
Returns the distribution Δ
from the paper directly (example). Use JointDistanceDistributionTest
to perform a formal indepencence test.
CausalityTools.library_indices
— Functionlibrary_indices(measure::CCMLike, est::CrossmapEstimator, i::Int, target, source)
Produce (randomly, if relevant) the i
-th subset of indices for a CrossmapEstimator
that is being applied to target
and source
.
CausalityTools.marginal
— Methodmarginal(p::Probabilities; dims = 1:ndims(p))
marginal(c::Counts; dims = 1:ndims(p))
Given a set of counts c
(a contingency table), or a multivariate probability mass function p
, return the marginal counts/probabilities along the given dims
.
CausalityTools.marginal_indices
— Methodmarginal_indices(x)
Returns a column vector v
with the same number of elements as there are unique elements in x
. v[i]
is the indices of elements in x
matching v[i]
.
For example, if the third unique element in x
, and the element u₃ = unique(x)[3]
appears four times in x
, then v[3]
is a vector of four integers indicating the position of the elements matching u₃
.
CausalityTools.marginal_probs_from_μ
— Methodmarginal_probs_from_μ(seleced_axes, visited_bins, iv::InvariantMeasure, inds_non0measure)
Estimate marginal probabilities from a pre-computed invariant measure, given a set of visited bins, an invariant measure and the indices of the positive-measure bins. The indices in selected_axes
determines which marginals are selected.
CausalityTools.max_inputs_vars
— Methodmax_inputs_vars(m::AssociationMeasure) → nmax::Int
Return the maximum number of variables is that the measure can be computed for.
For example, MIShannon
cannot be computed for more than 2 variables.
CausalityTools.max_segmentlength
— Functionmax_segmentlength(x::AbstractVector, measure::CrossmapMeasure)
Given an input vector x
, which is representative of the size of the other input vectors too, compute the maximum segment/library length that can be used for predictions.
CausalityTools.min_inputs_vars
— Methodmin_inputs_vars(m::AssociationMeasure) → nmin::Int
Return the minimum number of variables is that the measure can be computed for.
For example, CMIShannon
requires 3 input variables.
CausalityTools.optimize_marginals_te
— Functionoptimize_marginals_te([scheme = OptimiseTraditional()], s, t, [c]) → EmbeddingTE
Optimize marginal embeddings for transfer entropy computation from source time series s
to target time series t
, conditioned on c
if c
is given, using the provided optimization scheme
.
CausalityTools.optimize_marginals_te
— Methodoptimize_marginals_te(opt::OptimiseTraditional, s, t, [c]; exclude_source = false) → EmbeddingTE
Optimise the marginals for a transfer entropy analysis from source time series s
to target time series t
, potentially given a conditional time series c
.
If exclude_source == true
, then no optimisation is done for the source. This is useful for SurrogateAssociationTest
, because most surrogate methods accept univariate time series, and if we embed the source and it becomes multidimensional, then we can't create surrogates. A future optimization is to do column-wise surrogate generation.
CausalityTools.pairwise_test
— Methodpairwise_test(p::OCESelectedParents) → pairwise::Bool
If the parent set is empty, return true
(a pairwise test should be performed). If the parent set is nonempty, return false
(a conditional test should performed).
CausalityTools.partial_correlation_from_precision
— Methodpartial_correlation_from_precision(P, i::Int, j::Int)
Given a precision matrix P
, compute the partial correlation of variables i
and j
conditional on all other variables.
CausalityTools.predict
— Methodpredict(measure::CrossmapEstimator, t::AbstractVector, s::AbstractVector) → t̂ₛ, t̄, ρ
predict(measure::CrossmapEstimator, t̄::AbstractVector, S̄::AbstractStateSpaceSet) → t̂ₛ
Perform point-wise cross mappings between source embeddings and target time series according to the algorithm specified by the given cross-map measure
(e.g. ConvergentCrossMapping
or PairwiseAsymmetricInference
).
- First method: Jointly embeds the target
t
and sources
time series (according tomeasure
) to obtain time-index aligned target timeseriest̄
and source embeddingS̄
(which is now aStateSpaceSet
). Then callspredict(measure, t̄, S̄)
(the first method), and returns both the predictionst̂ₛ
, observationst̄
and their correspondenceρ
according tomeasure
. - Second method: Returns a vector of predictions
t̂ₛ
(t̂ₛ
:= "predictions oft̄
based on source embeddingS̄
"), wheret̂ₛ[i]
is the prediction fort̄[i]
. It assumes pre-embedded data which have been correctly time-aligned using a joint embedding, i.e. such thatt̄[i]
andS̄[i]
correspond to the same time index.
Description
For each i ∈ {1, 2, …, N}
where N = length(t) == length(s)
, we make the prediction t̂[i]
(an estimate of t[i]
) based on a linear combination of D + 1
other points in t
, where the selection of points and weights for the linear combination are determined by the D+1
nearest neighbors of the point S̄[i]
. The details of point selection and weights depend on measure
.
Note: Some CrossmapMeasure
s may define more general mapping procedures. If so, the algorithm is described in their docstring.
CausalityTools.rank_transformation
— Methodrank_transformation(x::AbstractVector)
rank_transformation(x::AbstractStateSpaceSet) → ranks::NTuple{D, Vector}
Rank-transform each variable/column of the length-n
D
-dimensional StateSpaceSet x
and return the rank-transformed variables as a D
-tuple of length-n
vectors.
Returns the unscaled ranks
. Divide by n
to get an approximation to the empirical cumulative distribution function (ECDF) x
.
Description
Modulo division by n
, rank_transformation
does roughly the same as naively computing the ECDF as
[count(xᵢ .<= x) for xᵢ in x] / length(x)
but an order of magnitude faster and with roughly three orders of magnitude less allocations. The increased efficiency of this function relative to naively computing the ECDF is because it uses sorting of the input data to determine ranks, arbitrarily breaking ties according to the sorting algorithm. Rank ties can therefore never occur, and equal values are assigned different but close ranks. To preserve ties, which you might want to do for example when dealing with categorical or integer-valued data, use (the much slower) empcdf
.
CausalityTools.s_measure
— Methods_measure(measure::SMeasure, x::VectorOrStateSpaceSet, y::VectorOrStateSpaceSet) → s ∈ [0, 1]
Compute the given measure
to quantify the directional dependence between univariate/multivariate time series x
and y
.
Returns a scalar s
where s = 0
indicates independence between x
and y
, and higher values indicate synchronization between x
and y
, with complete synchronization for s = 1.0
.
Example
using CausalityTools
x, y = rand(1000), rand(1000)
# 4-dimensional embedding for `x`, 5-dimensional embedding for `y`
m = SMeasure(dx = 4, τx = 3, dy = 5, τy = 1)
association(m, x, y)
CausalityTools.select_parents
— Methodselect_parents(alg::OCE, x)
The parent selection step of the OCE
algorithm, which identifies the parents of each xᵢ ∈ x
, assuming that x
must be integer-indexable, i.e. x[i]
yields the i
-th variable.
CausalityTools.skeleton
— Methodskeleton(algorithm::PC, x) → (g, s)
Infer the skeleton graph for the variables x
using the provided algorithm
. x
must be some iterable where the i
-th variable can be accessed as x[i]
.
Return a tuple of the undirected skeleton graph g::SimpleGraph
, and the separating sets s::Dict{SimpleEdge, Vector{Int}})
.
CausalityTools.skeleton_conditional!
— Methodskeleton_conditional!(alg::PC, graph, x, conditional_test::IndependenceTest)
Thin the skeleton graph
, where each vertex is represented by the data x[i]
, by using conditional_test
. Whenever x[i] ⫫ x[j] | x[S]
for some set of variables not including i
and j
, the edge between i
and j
is removed, and S
is stored in the separating set for i
and j
. This is essentially algorithm 3.2 in Colombo2014, for the cases 𝓁 >= 1
.
Modifies graph
in-place.
CausalityTools.skeleton_unconditional!
— Methodskeleton_unconditional!(alg::PC, g::SimpleGraph, x)
Perform pairwise independence tests between all vertices in the graph
, where each vertex is represented by the data x[i]
, and remove any edges in graph
where adjacent vertices are found to be independent according to the given independence test
. The null hypothesis of independence is rejected whenever the p-value is below α
.
This is essentially algorithm 3.2 in Colombo2014, but only considering the case 𝓁 = 0
.
Modifies graph
in-place.
If alg.pairwise_test
is a directed test, then edges are considered one-by-one. If alg.pairwise_test
is not a directed test, then edges (X → Y,
Y → X`) are considered simultaneously.
CausalityTools.te_embed
— Methodte_embed(source::VectorOr1DDataset, target::VectorOr1DDataset, p::EmbeddingTE) → (points, vars, τs)
te_embed(source::VectorOr1DDataset, target::VectorOr1DDataset, cond::VectorOr1DDataset, p::EmbeddingTE) → (points, vars, τs)
Generalised delay reconstruction of source
and target
(and cond
if provided) for transfer entropy computation using embedding parameters provided by the EmbeddingTE
instance p
.
Returns a tuple of the embedded points
, vars
(a TEVars
instance that keeps track of which variables of the embedding belong to which marginals of the reconstruction; indices are: source = 1, target = 2, cond = 3), and a tuple τs
, which stores the lags for each variable of the reconstruction.
CausalityTools.tolevels!
— Methodtolevels!(levels, x) → levels, dict
tolevels(x) → levels, dict
Apply the bijective map $f : \mathcal{Q} \to \mathbb{N}^+$ to each x[i]
and store the result in levels[i]
, where levels
is a pre-allocated integer vector such that length(x) == length(levels)
.
$\mathcal{Q}$ can be any space, and each $q \in \mathcal{Q}$ is mapped to a unique integer in the range 1, 2, …, length(unique(x))
. This is useful for integer-encoding categorical data such as strings, or other complex discrete data structures.
The single-argument method allocated a levels
vector internally.
dict
gives the inverse mapping.
CausalityTools.verify_decomposition_entropy_type
— Methodverify_decomposition_entropy_type(
definition::MultivariateInformationMeasure,
est::Union{DiscreteInfoEstimator, DifferentialInfoEstimator}
)
Check that we can actually decompose the definition
into est.definition
. The default is to do nothing. Certain definitions may override (e.g. CMIRenyiJizba
does so).
ComplexityMeasures.codify
— Methodcodify(encoding::CodifyPoints{N}, x::Vararg{<:AbstractStateSpaceSet, N})
Codify each timeseries xᵢ ∈ x
according to the given encoding
.
Examples
x = StateSpaceSet(rand(10000, 2))
y = StateSpaceSet(rand(10000, 3))
z = StateSpaceSet(rand(10000, 2))
# For `x`, we use a relative mean encoding.
ex = RelativeMeanEncoding(0.0, 1.0, n = 3)
# For `y`, we use a combination encoding.
ey = CombinationEncoding(
RelativeMeanEncoding(0.0, 1.0, n = 2),
OrdinalPatternEncoding(3)
)
# For `z`, we use ordinal patterns to encode.
ez = OrdinalPatternEncoding(2)
# Codify two input datasets gives a 2-tuple of Vector{Int}
codify(CodifyPoints(ex, ey), x, y)
# Codify three input datasets gives a 3-tuple of Vector{Int}
codify(CodifyPoints(ex, ey, ez), x, y, z)
ComplexityMeasures.codify
— Methodcodify(d::CodifyVariables, x::Vararg{<:AbstractStateSpaceSet, N})
codify(d::CodifyPoints, x::Vararg{<:AbstractStateSpaceSet, N})
Codify each timeseries xᵢ ∈ x
according to the given encoding/discretization d
.
Compatible discretizations
Examples
using CausalityTools
# Sliding window encoding
x = [0.1, 0.2, 0.3, 0.2, 0.1, 0.0, 0.5, 0.3, 0.5]
xc1 = codify(CodifyVariables(OrdinalPatterns(m=2)), x) # should give [1, 1, 2, 2, 2, 1, 2, 1]
xc2 = codify(OrdinalPatterns(m=2), x) # equivalent
length(xc1) < length(x) # should be true, because `OrdinalPatterns` delay embeds.
# Point-by-point encoding
x, y = StateSpaceSet(rand(100, 3)), StateSpaceSet(rand(100, 3))
cx, cy = codify(CodifyPoints(OrdinalPatternEncoding(3)), x, y)
ComplexityMeasures.counts
— Methodcounts(o::UniqueElements, x₁, x₂, ..., xₙ) → Counts{N}
counts(encoding::CodifyPoints, x₁, x₂, ..., xₙ) → Counts{N}
counts(encoding::CodifyVariables, x₁, x₂, ..., xₙ) → Counts{N}
Construct an N
-dimensional contingency table from the input iterables x₁, x₂, ..., xₙ
which are such that length(x₁) == length(x₂) == ⋯ == length(xₙ)
.
If x₁, x₂, ..., xₙ
are already discrete, then use UniqueElements
as the first argument to directly construct the joint contingency table.
If x₁, x₂, ..., xₙ
need to be discretized, provide as the first argument
CodifyPoints
(encodes every point in each of the input variablesxᵢ
s individually)CodifyVariables
(encodes everyxᵢ
individually using a sliding window encoding). NB: If using differentOutcomeSpace
s for the differentxᵢ
, thentotal_outcomes
must be the same for every outcome space.
Examples
# Discretizing some non-discrete data using a sliding-window encoding for each variable
x, y = rand(100), rand(100)
c = CodifyVariables(OrdinalPatterns(m = 4))
counts(c, x, y)
# Discretizing the data by binning each individual data point
binning = RectangularBinning(3)
encoding = RectangularBinEncoding(binning, [x; y]) # give input values to ensure binning covers all data
c = CodifyPoints(encoding)
counts(c, x, y)
# Counts table for already discrete data
n = 50 # all variables must have the same number of elements
x = rand(["dog", "cat", "mouse"], n)
y = rand(1:3, n)
z = rand([(1, 2), (2, 1)], n)
counts(UniqueElements(), x, y, z)
See also: CodifyPoints
, CodifyVariables
, UniqueElements
, OutcomeSpace
, probabilities
.
ComplexityMeasures.information
— Methodinformation(est::MultivariateInformationMeasureEstimator, x...)
Estimate some MultivariateInformationMeasure
on input data x...
, using the given MultivariateInformationMeasureEstimator
.
This is just a convenience wrapper around association
(est, x...)
.
ComplexityMeasures.probabilities
— Methodprobabilities(o::UniqueElements, x₁, x₂, ..., xₙ) → Counts{N}
probabilities(encoding::CodifyPoints, x₁, x₂, ..., xₙ) → Counts{N}
probabilities(encoding::CodifyVariables, x₁, x₂, ..., xₙ) → Counts{N}
Construct an N
-dimensional Probabilities
array from the input iterables x₁, x₂, ..., xₙ
which are such that length(x₁) == length(x₂) == ⋯ == length(xₙ)
.
Description
Probabilities are computed by first constructing a joint contingency matrix in the form of a Counts
instance.
If x₁, x₂, ..., xₙ
are already discrete, then use UniqueElements
as the first argument to directly construct the joint contingency table.
If x₁, x₂, ..., xₙ
need to be discretized, provide as the first argument
CodifyPoints
(encodes every point in each of the input variablesxᵢ
s individually)CodifyVariables
(encodes everyxᵢ
individually using a sliding window encoding).
Examples
# Discretizing some non-discrete data using a sliding-window encoding for each variable
x, y = rand(100), rand(100)
c = CodifyVariables(OrdinalPatterns(m = 4))
probabilities(c, x, y)
# Discretizing the data by binning each individual data point
binning = RectangularBinning(3)
encoding = RectangularBinEncoding(binning, [x; y]) # give input values to ensure binning covers all data
c = CodifyPoints(encoding)
probabilities(c, x, y)
# Joint probabilities for already discretized data
n = 50 # all variables must have the same number of elements
x = rand(["dog", "cat", "mouse"], n)
y = rand(1:3, n)
z = rand([(1, 2), (2, 1)], n)
probabilities(UniqueElements(), x, y, z)
See also: CodifyPoints
, CodifyVariables
, UniqueElements
, OutcomeSpace
.
DelayEmbeddings.embed
— Methodembed(measure::CrossmapMeasure, target::AbstractVector,
sources::AbstractVector...) → emb::StateSpaceSet{D}
Jointly embed the input vector target
, together with one or more vectors s ∈ sources
, according to the given CrossmapMeasure
.
This produces emb
, a D
-dimensional StateSpaceSet
where
- The last column is always the non-lagged
target
variable. Typically, this is the variable we're trying to predict. - The
D-1
first columns are the (non)lagged versions of each source time seriess ∈ sources
. Typically,emb[:, 1:D-1]
is the subspace in which neighborhood searches are done, which forms the basis of cross-mappredict
ions.
StatsAPI.pvalue
— Methodpvalue(test::CorrTest, z, c::Int, n::Int)
Compute the two-sided p-value for the test of the partial correlation coefficient p̂
being zero, where c
is the cardinality of the conditioning set and n
is the number of samples.
- Székely2007Székely, G. J., Rizzo, M. L., & Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. The annals of statistics, 35(6), 2769-2794.
- Székely2007Székely, G. J., Rizzo, M. L., & Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. The annals of statistics, 35(6), 2769-2794.