CausalityTools.CausalityToolsModule

CausalityTools

CI codecov DOI

CausalityTools.jl is a package for quantifying associations, independence testing and causal inference.

All further information is provided in the documentation, which you can either find online or build locally by running the docs/make.jl file.

Key features

  • Association API: includes measures and their estimators for pairwise, conditional and other forms of association from conventional statistics, from dynamical systems theory, and from information theory: partial correlation, distance correlation, (conditional) mutual information, transfer entropy, convergent cross mapping and a lot more!
  • Independence testing API, which is automatically compatible with every association measure estimator implemented in the package.
  • Causal (network) inference API integrating the association measures and independence testing framework.

Addititional features

Extending on features from ComplexityMeasures.jl, we also offer

  • Discretization API for multiple (multivariate) input datasets.
  • Multivariate counting and probability estimation API.
  • Multivariate information measure API

Installation

To install the package, run import Pkg; Pkg.add("CausalityTools").

CausalityTools.AmplitudeType
Amplitude <: InstantaneousSignalProperty

Indicates that the instantaneous amplitudes of a signal should be used.

CausalityTools.AssociationMeasureType
AssociationMeasure

The supertype of all association measures.

Abstract implementations

Currently, the association measures are classified by abstract classes listed below. These abstract classes offer common functionality among association measures that are conceptually similar. This makes maintenance and framework extension easier than if each measure was implemented "in isolation".

Concrete implementations

Concrete subtypes are given as input to association. Many of these types require an AssociationMeasureEstimator to compute.

TypeAssociationMeasurePairwiseConditional
CorrelationPearsonCorrelation
CorrelationDistanceCorrelation
ClosenessSMeasure
ClosenessHMeasure
ClosenessMMeasure
Closeness (ranks)LMeasure
ClosenessJointDistanceDistribution
Cross-mappingPairwiseAsymmetricInference
Cross-mappingConvergentCrossMapping
Conditional recurrenceMCR
Conditional recurrenceRMCD
Shared informationMIShannon
Shared informationMIRenyiJizba
Shared informationMIRenyiSarbu
Shared informationMITsallisFuruichi
Shared informationPartialCorrelation
Shared informationCMIShannon
Shared informationCMIRenyiSarbu
Shared informationCMIRenyiJizba
Shared informationCMIRenyiPoczos
Shared informationCMITsallisPapapetrou
Information transferTEShannon
Information transferTERenyiJizba
Partial mutual informationPartialMutualInformation
Information measureJointEntropyShannon
Information measureJointEntropyRenyi
Information measureJointEntropyTsallis
Information measureConditionalEntropyShannon
Information measureConditionalEntropyTsallisAbe
Information measureConditionalEntropyTsallisFuruichi
DivergenceHellingerDistance
DivergenceKLDivergence
DivergenceRenyiDivergence
DivergenceVariationDistance
CausalityTools.AssociationMeasureEstimatorType
AssociationMeasureEstimator

The supertype of all association measure estimators.

Concrete subtypes are given as input to association.

Abstract subtypes

Concrete implementations

AssociationMeasureEstimators
PearsonCorrelationNot required
DistanceCorrelationNot required
PartialCorrelationNot required
SMeasureNot required
HMeasureNot required
MMeasureNot required
LMeasureNot required
JointDistanceDistributionNot required
PairwiseAsymmetricInferenceRandomVectors, RandomSegment
ConvergentCrossMappingRandomVectors, RandomSegment
MCRNot required
RMCDNot required
MIShannonJointProbabilities, EntropyDecomposition, KraskovStögbauerGrassberger1, KraskovStögbauerGrassberger2, GaoOhViswanath, GaoKannanOhViswanath, GaussianMI
MIRenyiJizbaJointProbabilities, EntropyDecomposition
MIRenyiSarbuJointProbabilities
MITsallisFuruichiJointProbabilities, EntropyDecomposition
MITsallisMartinJointProbabilities, EntropyDecomposition
CMIShannonJointProbabilities, EntropyDecomposition, MIDecomposition, GaussianCMI, FPVP, MesnerShalizi, Rahimzamani
CMIRenyiSarbuJointProbabilities
CMIRenyiJizbaJointProbabilities, EntropyDecomposition
CMIRenyiPoczosPoczosSchneiderCMI
CMITsallisPapapetrouJointProbabilities
TEShannonJointProbabilities, EntropyDecomposition, Zhu1, Lindner
TERenyiJizbaJointProbabilities
PartialMutualInformationJointProbabilities
JointEntropyShannonJointProbabilities
JointEntropyRenyiJointProbabilities
JointEntropyTsallisJointProbabilities
ConditionalEntropyShannonJointProbabilities
ConditionalEntropyTsallisAbeJointProbabilities
ConditionalEntropyTsallisFuruichiJointProbabilities
HellingerDistanceJointProbabilities
KLDivergenceJointProbabilities
RenyiDivergenceJointProbabilities
VariationDistanceJointProbabilities
CausalityTools.CMIDecompositionType
CMIDecomposition(definition::MultivariateInformationMeasure, 
    est::ConditionalMutualInformationEstimator)

Estimate some multivariate information measure specified by definition, by decomposing it into a combination of conditional mutual information terms.

Usage

Description

Each of the conditional mutual information terms are estimated using est, which can be any ConditionalMutualInformationEstimator. Finally, these estimates are combined according to the relevant decomposition formula.

This estimator is similar to EntropyDecomposition, but definition is expressed as conditional mutual information terms instead of entropy terms.

Examples

See also: ConditionalMutualInformationEstimator, MultivariateInformationMeasureEstimator, MultivariateInformationMeasure.

CausalityTools.CMIRenyiJizbaType
CMIRenyiJizba <: ConditionalMutualInformation
CMIRenyiJizba(; base = 2, q = 1.5)

The Rényi conditional mutual information $I_q^{R_{J}}(X; Y | Z)$ defined in Jizba2012.

Usage

  • Use with association to compute the raw Rényi-Jizba conditional mutual information using of of the estimators listed below.
  • Use with independence to perform a formal hypothesis test for pairwise conditional independence using the Rényi-Jizba conditional mutual information.

Compatible estimators

Definition

\[I_q^{R_{J}}(X; Y | Z) = I_q^{R_{J}}(X; Y, Z) - I_q^{R_{J}}(X; Z),\]

where $I_q^{R_{J}}(X; Z)$ is the MIRenyiJizba mutual information.

Estimation

CausalityTools.CMIRenyiPoczosType
CMIRenyiPoczos <: ConditionalMutualInformation
CMIRenyiPoczos(; base = 2, q = 1.5)

The differential Rényi conditional mutual information $I_q^{R_{P}}(X; Y | Z)$ defined in Poczos2012.

Usage

  • Use with association to compute the raw Rényi-Poczos conditional mutual information using of of the estimators listed below.
  • Use with independence to perform a formal hypothesis test for pairwise conditional independence using the Rényi-Poczos conditional mutual information.

Compatible estimators

Definition

\[\begin{align*} I_q^{R_{P}}(X; Y | Z) &= \dfrac{1}{q-1} \int \int \int \dfrac{p_Z(z) p_{X, Y | Z}^q}{( p_{X|Z}(x|z) p_{Y|Z}(y|z) )^{q-1}} \\ &= \mathbb{E}_{X, Y, Z} \sim p_{X, Y, Z} \left[ \dfrac{p_{X, Z}^{1-q}(X, Z) p_{Y, Z}^{1-q}(Y, Z) }{p_{X, Y, Z}^{1-q}(X, Y, Z) p_Z^{1-q}(Z)} \right] \end{align*}\]

Estimation

CausalityTools.CMIRenyiSarbuType
CMIRenyiSarbu <: ConditionalMutualInformation
CMIRenyiSarbu(; base = 2, q = 1.5)

The Rényi conditional mutual information from Sarbu2014.

Usage

  • Use with association to compute the raw Rényi-Sarbu conditional mutual information using of of the estimators listed below.
  • Use with independence to perform a formal hypothesis test for pairwise conditional independence using the Rényi-Sarbu conditional mutual information.

Compatible estimators

Discrete description

Assume we observe three discrete random variables $X$, $Y$ and $Z$. Sarbu (2014) defines discrete conditional Rényi mutual information as the conditional Rényi $\alpha$-divergence between the conditional joint probability mass function $p(x, y | z)$ and the product of the conditional marginals, $p(x |z) \cdot p(y|z)$:

\[I(X, Y; Z)^R_q = \dfrac{1}{q-1} \sum_{z \in Z} p(Z = z) \log \left( \sum_{x \in X}\sum_{y \in Y} \dfrac{p(x, y|z)^q}{\left( p(x|z)\cdot p(y|z) \right)^{q-1}} \right)\]

CausalityTools.CMIShannonType
CMIShannon <: ConditionalMutualInformation
CMIShannon(; base = 2)

The Shannon conditional mutual information (CMI) $I^S(X; Y | Z)$.

Usage

  • Use with association to compute the raw Shannon conditional mutual information using of of the estimators listed below.
  • Use with independence to perform a formal hypothesis test for pairwise conditional independence using the Shannon conditional mutual information.

Compatible estimators

Supported definitions

Consider random variables $X \in \mathbb{R}^{d_X}$ and $Y \in \mathbb{R}^{d_Y}$, given $Z \in \mathbb{R}^{d_Z}$. The Shannon conditional mutual information is defined as

\[\begin{align*} I(X; Y | Z) &= H^S(X, Z) + H^S(Y, z) - H^S(X, Y, Z) - H^S(Z) \\ &= I^S(X; Y, Z) + I^S(X; Y) \end{align*},\]

where $I^S(\cdot; \cdot)$ is the Shannon mutual information MIShannon, and $H^S(\cdot)$ is the Shannon entropy.

Differential Shannon CMI is obtained by replacing the entropies by differential entropies.

Estimation

CausalityTools.CMITsallisPapapetrouType
CMITsallisPapapetrou <: ConditionalMutualInformation
CMITsallisPapapetrou(; base = 2, q = 1.5)

The Tsallis-Papapetrou conditional mutual information Papapetrou2020.

Usage

  • Use with association to compute the raw Tsallis-Papapetrou conditional mutual information using of of the estimators listed below.
  • Use with independence to perform a formal hypothesis test for pairwise conditional independence using the Tsallis-Papapetrou conditional mutual information.

Compatible estimators

Definition

Tsallis-Papapetrou conditional mutual information is defined as

\[I_T^q(X, Y \mid Z) = \frac{1}{1 - q} \left( 1 - \sum_{XYZ} \frac{p(x, y, z)^q}{p(x \mid z)^{q-1} p(y \mid z)^{q-1} p(z)^{q-1}} \right).\]

CausalityTools.CodifyPointsType
CodifyPoints{N}
CodifyPoints(encodings::NTuple{N, Encoding})

CodifyPoints points is a Discretization scheme that encodes input data points without applying any sequential transformation to the input (as opposed to CodifyVariables, which may apply some transformation before encoding).

Usage

  • Use with codify to encode/discretize input variable on a point-by-point basis.

Compatible encodings

Description

Given x::AbstractStateSpaceSet..., where the i-th dataset is assumed to represent a single series of measurements, CodifyPoints encodes each point pₖ ∈ x[i] using some Encoding(s), without applying any (sequential) transformation to the x[i] first. This behaviour is different to CodifyVariables, which does apply a transformation to x[i] before encoding.

If length(x) == N (i.e. there are N input dataset), then encodings must be a tuple of N Encoding. Alternatively, if encodings is a single Encoding, then that same encoding is applied to every x[i].

Examples

using CausalityTools

# The same encoding on two input datasets
x = StateSpaceSet(rand(100, 3))
y = StateSpaceSet(rand(100, 3))
encoding_ord = OrdinalPatternEncoding(3)
cx, cy = codify(CodifyPoints(encoding_ord), x, y)

# Different encodings on multiple datasets
z = StateSpaceSet(rand(100, 2))
encoding_bin = RectangularBinEncoding(RectangularBinning(3), z)
d = CodifyPoints(encoding_ord, encoding_ord, encoding_bin)
cx, cy, cz = codify(d, x, y, z)
CausalityTools.CodifyVariablesType
CodifyVariables <: Discretization
CodifyVariables(outcome_space::OutcomeSpace)

The CodifyVariables discretization scheme quantises input data in a column-wise manner using the given outcome_space.

Compatible outcome spaces

Description

The main difference between CodifyVariables and [CodifyPoints] is that the former uses OutcomeSpaces for discretization. This usually means that some transformation is applied to the data before discretizing. For example, some outcome constructs a delay embedding from the input (and thus encodes sequential information) before encoding the data.

Specifically, given x::AbstractStateSpaceSet..., where the i-th dataset x[i] is assumed to represent a single series of measurements, CodifyVariables encodes x[i] by codify-ing into a series of integers using an appropriate OutcomeSpace. This is typically done by first sequentially transforming the data and then running sliding window (the width of the window is controlled by outcome_space) across the data, and then encoding the values within each window to an integer.

Examples

using CausalityTools
x, y = rand(100), rand(100)
d = CodifyVariables(OrdinalPatterns(m=2))
cx, cy = codify(d, x, y)
CausalityTools.ConditionalEntropyShannonType
ConditionalEntropyShannon <: ConditionalEntropy
ConditionalEntropyShannon(; base = 2)

The Shannon conditional entropy measure.

Usage

  • Use with association to compute the Shannon conditional entropy between two variables.

Compatible estimators

Discrete definition

Sum formulation

The conditional entropy between discrete random variables $X$ and $Y$ with finite ranges $\mathcal{X}$ and $\mathcal{Y}$ is defined as

\[H^{S}(X | Y) = -\sum_{x \in \mathcal{X}, y \in \mathcal{Y}} p(x, y) \log(p(x | y)).\]

This is the definition used when calling association with a JointProbabilities estimator.

Two-entropies formulation

Equivalently, the following differenConditionalEntropy of entropies hold

\[H^S(X | Y) = H^S(X, Y) - H^S(Y),\]

where $H^S(\cdot)$ and $H^S(\cdot | \cdot)$ are the Shannon entropy and Shannon joint entropy, respectively. This is the definition used when calling association with a ProbabilitiesEstimator.

Differential definition

The differential conditional Shannon entropy is analogously defined as

\[H^S(X | Y) = h^S(X, Y) - h^S(Y),\]

where $h^S(\cdot)$ and $h^S(\cdot | \cdot)$ are the Shannon differential entropy and Shannon joint differential entropy, respectively. This is the definition used when calling association with a DifferentialInfoEstimator.

Estimation

CausalityTools.ConditionalEntropyTsallisAbeType
ConditionalEntropyTsallisAbe <: ConditionalEntropy
ConditionalEntropyTsallisAbe(; base = 2, q = 1.5)

Abe2001's discrete Tsallis conditional entropy measure.

Usage

  • Use with association to compute the Tsallis-Abe conditional entropy between two variables.

Compatible estimators

Definition

Abe & Rajagopal's Tsallis conditional entropy between discrete random variables $X$ and $Y$ with finite ranges $\mathcal{X}$ and $\mathcal{Y}$ is defined as

\[H_q^{T_A}(X | Y) = \dfrac{H_q^T(X, Y) - H_q^T(Y)}{1 + (1-q)H_q^T(Y)},\]

where $H_q^T(\cdot)$ and $H_q^T(\cdot, \cdot)$ is the Tsallis entropy and the joint Tsallis entropy.

Estimation

CausalityTools.ConditionalEntropyTsallisFuruichiType
ConditionalEntropyTsallisFuruichi <: ConditionalEntropy
ConditionalEntropyTsallisFuruichi(; base = 2, q = 1.5)

Furuichi (2006)'s discrete Tsallis conditional entropy definition.

Usage

  • Use with association to compute the Tsallis-Furuichi conditional entropy between two variables.

Compatible estimators

Definition

Furuichi's Tsallis conditional entropy between discrete random variables $X$ and $Y$ with finite ranges $\mathcal{X}$ and $\mathcal{Y}$ is defined as

\[H_q^T(X | Y) = -\sum_{x \in \mathcal{X}, y \in \mathcal{Y}} p(x, y)^q \log_q(p(x | y)),\]

$\ln_q(x) = \frac{x^{1-q} - 1}{1 - q}$ and $q \neq 1$. For $q = 1$, $H_q^T(X | Y)$ reduces to the Shannon conditional entropy:

\[H_{q=1}^T(X | Y) = -\sum_{x \in \mathcal{X}, y \in \mathcal{Y}} = p(x, y) \log(p(x | y))\]

If any of the entries of the marginal distribution for Y are zero, or the q-logarithm is undefined for a particular value, then the measure is undefined and NaN is returned.

Estimation

CausalityTools.ConvergentCrossMappingType
ConvergentCrossMapping <: CrossmapMeasure
ConvergentCrossMapping(; d::Int = 2, τ::Int = -1, w::Int = 0,
    f = Statistics.cor, embed_warn = true)

The convergent cross mapping measure Sugihara2012.

Usage

Compatible estimators

Description

The Theiler window w controls how many temporal neighbors are excluded during neighbor searches (w = 0 means that only the point itself is excluded). f is a function that computes the agreement between observations and predictions (the default, f = Statistics.cor, gives the Pearson correlation coefficient).

Embedding

Let S(i) be the source time series variable and T(i) be the target time series variable. This version produces regular embeddings with fixed dimension d and embedding lag τ as follows:

\[( S(i), S(i+\tau), S(i+2\tau), \ldots, S(i+(d-1)\tau, T(i))_{i=1}^{N-(d-1)\tau}.\]

In this joint embedding, neighbor searches are performed in the subspace spanned by the first D-1 variables, while the last (D-th) variable is to be predicted.

With this convention, τ < 0 implies "past/present values of source used to predict target", and τ > 0 implies "future/present values of source used to predict target". The latter case may not be meaningful for many applications, so by default, a warning will be given if τ > 0 (embed_warn = false turns off warnings).

Estimation

CausalityTools.CorrTestType
CorrTest <: IndependenceTest
CorrTest()

An independence test based correlation (for two variables) and partial correlation (for three variables) Levy1978; as described in Schmidt2018.

Uses PearsonCorrelation and PartialCorrelation internally.

Assumes that the input data are (multivariate) normally distributed. Then ρ(X, Y) = 0 implies X ⫫ Y and ρ(X, Y | 𝐙) = 0 implies X ⫫ Y | 𝐙.

Description

The null hypothesis is H₀ := ρ(X, Y | 𝐙) = 0. We use the approach in Levy & Narula (1978)Levy1978 and compute the Z-transformation of the observed (partial) correlation coefficient $\hat{\rho}_{XY|\bf{Z}}$:

\[Z(\hat{\rho}_{XY|\bf{Z}}) = \log\dfrac{1 + \hat{\rho}_{XY|\bf{Z}}}{1 - \hat{\rho}_{XY|\bf{Z}}}.\]

To test the null hypothesis against the alternative hypothesis H₁ := ρ(X, Y | 𝐙) > 0, calculate

\[\hat{Z} = \dfrac{1}{2}\dfrac{Z(\hat{\rho}_{XY|\bf{Z}}) - Z(0)}{\sqrt{1/(n - d - 3)}},\]

and compute the two-sided p-value (Schmidt et al., 2018)

\[p(X, Y | \bf{Z}) = 2(1 - \phi(\sqrt{n - d - 3}Z(\hat{\rho}_{XY|\bf{Z}}))),\]

where $d$ is the dimension of $\bf{Z}$ and $n$ is the number of samples. For the pairwise case, the procedure is identical, but set $\bf{Z} = \emptyset$.

Examples

  • Example 1. Pairwise and conditional tests for independence on coupled noise processes.
CausalityTools.CorrTestResultType
CorrTestResult(pvalue, ρ, z)

A simple struct that holds the results of a CorrTest test: the (partial) correlation coefficient ρ, Fisher's z, and pvalue - the two-sided p-value for the test.

CausalityTools.CrossmapEstimatorType
CrossmapEstimator{M<:CrossmapMeasure, LIBSIZES, RNG}

The abstract supertype for all cross-map estimators.

Concrete subtypes

Description

Because the type of the library may differ between estimators, and because RNGs from different packages may be used, subtypes must implement the LIBSIZES and RNG type parameters.

For efficiency purposes, subtypes may contain mutable containers that can be re-used for ensemble analysis (see Ensemble).

Libraries

A cross-map estimator uses the concept of "libraries". A library is essentially just a reference to a set of points, and usually, a library refers to indices of points, not the actual points themselves.

For example, for timeseries, RandomVectors(libsizes = 50:25:100) produces three separate libraries, where the first contains 50 randomly selected time indices, the second contains 75 randomly selected time indices, and the third contains 100 randomly selected time indices. This of course assumes that all quantities involved can be indexed using the same time indices, meaning that the concept of "library" only makes sense after relevant quantities have been jointly embedded, so that they can be jointly indexed. For non-instantaneous prediction, the maximum possible library size shrinks with the magnitude of the index/time-offset for the prediction.

For spatial analyses (not yet implemented), indices could be more complex and involve multi-indices.

CausalityTools.DistanceCorrelationType
DistanceCorrelation

The distance correlation Szekely2007 measure quantifies potentially nonlinear associations between pairs of variables. If applied to three variables, the partial distance correlation Szekely2014 is computed.

Usage

  • Use with association to compute the raw (partial) distance correlation coefficient.
  • Use with independence to perform a formal hypothesis test for pairwise dependence.

Description

The distance correlation can be used to compute the association between two variables, or the conditional association between three variables, like so:

association(DistanceCorrelation(), x, y) → dcor ∈ [0, 1]
association(DistanceCorrelation(), x, y, z) → pdcor

With two variable, we comptue dcor, which is called the empirical/sample distance correlation Szekely2007. With three variables, the partial distance correlation pdcor is computed Szekely2014.

Warn

A partial distance correlation distance_correlation(X, Y, Z) = 0 doesn't always guarantee conditional independence X ⫫ Y | Z. Szekely2014 for an in-depth discussion.

CausalityTools.DivergenceOrDistanceType
DivergenceOrDistance <: BivariateInformationMeasure

The supertype for bivariate information measures aiming to quantify some sort of divergence, distance or closeness between two probability distributions.

Some of these measures are proper metrics, while others are not, but they have in common that they aim to quantify how "far from each other" two probabilities distributions are.

Concrete implementations

CausalityTools.EmbeddingTEType
EmbeddingTE(; dS = 1, dT = 1, dTf = 1, dC = 1, τS = -1, τT = -1, ηTf = 1, τC = -1)
EmbeddingTE(opt::OptimiseTraditional, s, t, [c])

EmbeddingTE provide embedding parameters for transfer entropy analysis using either TEShannon, TERenyiJizba, or in general any subtype of TransferEntropy.

The second method finds parameters using the "traditional" optimised embedding techniques from DynamicalSystems.jl

Convention for generalized delay reconstruction

We use the following convention. Let $s(i)$ be time series for the source variable, $t(i)$ be the time series for the target variable and $c(i)$ the time series for the conditional variable. To compute transfer entropy, we need the following marginals:

\[\begin{aligned} T^{+} &= \{t(i+\eta^1), t(i+\eta^2), \ldots, (t(i+\eta^{d_{T^{+}}}) \} \\ T^{-} &= \{ (t(i+\tau^0_{T}), t(i+\tau^1_{T}), t(i+\tau^2_{T}), \ldots, t(t + \tau^{d_{T} - 1}_{T})) \} \\ S^{-} &= \{ (s(i+\tau^0_{S}), s(i+\tau^1_{S}), s(i+\tau^2_{S}), \ldots, s(t + \tau^{d_{S} - 1}_{S})) \} \\ C^{-} &= \{ (c(i+\tau^0_{C}), c(i+\tau^1_{C}), c(i+\tau^2_{C}), \ldots, c(t + \tau^{d_{C} - 1}_{C})) \} \end{aligned}\]

Depending on the application, the delay reconstruction lags $\tau^k_{T} \leq 0$, $\tau^k_{S} \leq 0$, and $\tau^k_{C} \leq 0$ may be equally spaced, or non-equally spaced. The same applied to the prediction lag(s), but typically only a only a single predictions lag $\eta^k$ is used (so that $d_{T^{+}} = 1$).

For transfer entropy, traditionally at least one $\tau^k_{T}$, one $\tau^k_{S}$ and one $\tau^k_{C}$ equals zero. This way, the $T^{-}$, $S^{-}$ and $C^{-}$ marginals always contains present/past states, while the $\mathcal T$ marginal contain future states relative to the other marginals. However, this is not a strict requirement, and modern approaches that searches for optimal embeddings can return embeddings without the intantaneous lag.

Combined, we get the generalized delay reconstruction $\mathbb{E} = (T^{+}_{(d_{T^{+}})}, T^{-}_{(d_{T})}, S^{-}_{(d_{S})}, C^{-}_{(d_{C})})$. Transfer entropy is then computed as

\[\begin{aligned} TE_{S \rightarrow T | C} = \int_{\mathbb{E}} P(T^{+}, T^-, S^-, C^-) \log_{b}{\left(\frac{P(T^{+} | T^-, S^-, C^-)}{P(T^{+} | T^-, C^-)}\right)}, \end{aligned}\]

or, if conditionals are not relevant,

\[\begin{aligned} TE_{S \rightarrow T} = \int_{\mathbb{E}} P(T^{+}, T^-, S^-) \log_{b}{\left(\frac{P(T^{+} | T^-, S^-)}{P(T^{+} | T^-)}\right)}, \end{aligned}\]

Here,

  • $T^{+}$ denotes the $d_{T^{+}}$-dimensional set of vectors furnishing the future states of $T$ (almost always equal to 1 in practical applications),
  • $T^{-}$ denotes the $d_{T}$-dimensional set of vectors furnishing the past and present states of $T$,
  • $S^{-}$ denotes the $d_{S}$-dimensional set of vectors furnishing the past and present of $S$, and
  • $C^{-}$ denotes the $d_{C}$-dimensional set of vectors furnishing the past and present of $C$.

Keyword arguments

  • dS, dT, dC, dTf (f for future) are the dimensions of the $S^{-}$, $T^{-}$, $C^{-}$ and $T^{+}$ marginals. The parameters dS, dT, dC and dTf must each be a positive integer number.
  • τS, τT, τC are the embedding lags for $S^{-}$, $T^{-}$, $C^{-}$. Each parameter are integers ∈ 𝒩⁰⁻, or a vector of integers ∈ 𝒩⁰⁻, so that $S^{-}$, $T^{-}$, $C^{-}$ always represents present/past values. If e.g. τT is an integer, then for the $T^-$ marginal is constructed using lags $\tau_{T} = \{0, \tau, 2\tau, \ldots, (d_{T}- 1)\tau_T \}$. If is a vector, e.g. τΤ = [-1, -5, -7], then the dimension dT must match the lags, and precisely those lags are used: $\tau_{T} = \{-1, -5, -7 \}$.
  • The prediction lag(s) ηTf is a positive integer. Combined with the requirement that the other delay parameters are zero or negative, this ensures that we're always predicting from past/present to future. In typical applications, ηTf = 1 is used for transfer entropy.

Examples

Say we wanted to compute the Shannon transfer entropy $TE^S(S \to T) = I^S(T^+; S^- | T^-)$. Using some modern procedure for determining optimal embedding parameters using methods from DynamicalSystems.jl, we find that the optimal embedding of $T^{-}$ is three-dimensional and is given by the lags [0, -5, -8]. Using the same procedure, we find that the optimal embedding of $S^{-}$ is two-dimensional with lags $[-1, -8]$. We want to predicting a univariate version of the target variable one time step into the future (ηTf = 1). The total embedding is then the set of embedding vectors

$E_{TE} = \{ (T(i+1), S(i-1), S(i-8), T(i), T(i-5), T(i-8)) \}$. Translating this to code, we get:

using CausalityTools
julia> EmbeddingTE(dT=3, τT=[0, -5, -8], dS=2, τS=[-1, -4], ηTf=1)

# output
EmbeddingTE(dS=2, dT=3, dC=1, dTf=1, τS=[-1, -4], τT=[0, -5, -8], τC=-1, ηTf=1)
CausalityTools.EntropyDecompositionType
EntropyDecomposition(definition::MultivariateInformationMeasure, 
    est::DifferentialInfoEstimator)
EntropyDecomposition(definition::MultivariateInformationMeasure,
    est::DiscreteInfoEstimator,
    discretization::CodifyVariables{<:OutcomeSpace},
    pest::ProbabilitiesEstimator = RelativeAmount())

Estimate the multivariate information measure specified by definition by rewriting its formula into some combination of entropy terms.

If calling the second method (discrete variant), then discretization is always done per variable/column and each column is encoded into integers using codify.

Usage

Description

The entropy terms are estimated using est, and then combined to form the final estimate of definition. No bias correction is applied. If est is a DifferentialInfoEstimator, then discretization and pest are ignored. If est is a DiscreteInfoEstimator, then discretization and a probabilities estimator pest must also be provided (default to RelativeAmount, which uses naive plug-in probabilities).

Compatible differential information estimators

If using the first signature, any compatible DifferentialInfoEstimator can be used.

Compatible outcome spaces for discrete estimation

If using the second signature, the outcome spaces can be used for discretisation. Note that not all outcome spaces will work with all measures.

EstimatorPrinciple
UniqueElementsCount of unique elements
ValueBinningBinning (histogram)
OrdinalPatternsOrdinal patterns
DispersionDispersion patterns
BubbleSortSwapsSorting complexity
CosineSimilarityBinningCosine similarities histogram

Bias

Estimating the definition by decomposition into a combination of entropy terms, which are estimated independently, will in general be more biased than when using a dedicated estimator. One reason is that this decomposition may miss out on crucial information in the joint space. To remedy this, dedicated information measure estimators typically derive the marginal estimates by first considering the joint space, and then does some clever trick to eliminate the bias that is introduced through a naive decomposition. Unless specified below, no bias correction is applied for EntropyDecomposition.

Handling of overlapping parameters

If there are overlapping parameters between the measure to be estimated, and the lower-level decomposed measures, then the top-level measure parameter takes precedence. For example, if we want to estimate CMIShannon(base = 2) through a decomposition of entropies using the Kraskov(Shannon(base = ℯ)) Shannon entropy estimator, then base = 2 is used.

Info

Not all measures have the property that they can be decomposed into more fundamental information theoretic quantities. For example, MITsallisMartin can be decomposed into a combination of marginal entropies, while MIRenyiSarbu cannot. An error will be thrown if decomposition is not possible.

Discrete entropy decomposition

The second signature is for discrete estimation using DiscreteInfoEstimators, for example PlugIn. The given discretization scheme (typically an OutcomeSpace) controls how the joint/marginals are discretized, and the probabilities estimator pest controls how probabilities are estimated from counts.

Bias

Like for DifferentialInfoEstimator, using a dedicated estimator for the measure in question will be more reliable than using a decomposition estimate. Here's how different discretizations are applied:

  • ValueBinning. Bin visitation frequencies are counted in the joint space XY, then marginal visitations are obtained from the joint bin visits. This behaviour is the same for both FixedRectangularBinning and RectangularBinning (which adapts the grid to the data). When using FixedRectangularBinning, the range along the first dimension is used as a template for all other dimensions. This is a bit slower than naively binning each marginal, but lessens bias.
  • OrdinalPatterns. Each timeseries is separately codify-ed according to its ordinal pattern (no bias correction).
  • Dispersion. Each timeseries is separately codify-ed according to its dispersion pattern (no bias correction).

Examples

See also: MutualInformationEstimator, MultivariateInformationMeasure.

CausalityTools.ExpandingSegmentType
ExpandingSegment <: CrossmapEstimator
ExpandingSegment(definition::CrossmapMeasure; libsizes, rng = Random.default_rng())

Cross map once over N = length(libsizes) different "point libraries", where point indices are selected as time-contiguous segments/windows.

This is the method from Sugihara2012. See CrossmapEstimator for an in-depth explanation of what "library" means in this context.

Description

Point index segments are selected as first available data point index, up to the Lth data point index. This results in one library of contiguous time indices per L ∈ libsizes.

If used in an ensemble setting, the estimator is applied to time indices Lmin:step:Lmax of the joint embedding.

Returns

The return type when used with association depends on the type of libsizes.

  • If libsizes is an Int (a single library), then a single cross-map estimate is returned.
  • If libsizes is an AbstractVector{Int} (multiple libraries), then a vector of cross-map estimates is returned –- one per library.
CausalityTools.FPVPType
FPVP <: ConditionalMutualInformationEstimator
FPVP(definition = CMIShannon(); k = 1, w = 0)

The Frenzel-Pompe-Vejmelka-Paluš (or FPVP for short) ConditionalMutualInformationEstimator is used to estimate the conditional mutual information using a k-th nearest neighbor approach that is analogous to that of the KraskovStögbauerGrassberger1 mutual information estimator from Frenzel2007 and Vejmelka2008.

k is the number of nearest neighbors. w is the Theiler window, which controls the number of temporal neighbors that are excluded during neighbor searches.

Compatible definitions

Usage

Examples

CausalityTools.GaoKannanOhViswanathType
GaoKannanOhViswanath <: MutualInformationEstimator
GaoKannanOhViswanath(; k = 1, w = 0)

The GaoKannanOhViswanath (Shannon) estimator is designed for estimating Shannon mutual information between variables that may be either discrete, continuous or a mixture of both GaoKannanOhViswanath2017.

Compatible definitions

Usage

  • Use with association to compute Shannon mutual information from input data.
  • Use with some IndependenceTest to test for independence between variables.

Description

The estimator starts by expressing mutual information in terms of the Radon-Nikodym derivative, and then estimates these derivatives using k-nearest neighbor distances from empirical samples.

The estimator avoids the common issue of having to add noise to data before analysis due to tied points, which may bias other estimators. Citing their paper, the estimator "strongly outperforms natural baselines of discretizing the mixed random variables (by quantization) or making it continuous by adding a small Gaussian noise."

Implementation note

In GaoKannanOhViswanath2017, they claim (roughly speaking) that the estimator reduces to the KraskovStögbauerGrassberger1 estimator for continuous-valued data. However, KraskovStögbauerGrassberger1 uses the digamma function, while GaoKannanOhViswanath uses the logarithm instead, so the estimators are not exactly equivalent for continuous data.

Moreover, in their algorithm 1, it is clearly not the case that the method falls back on the KraskovStögbauerGrassberger1 approach. The KraskovStögbauerGrassberger1 estimator uses k-th neighbor distances in the joint space, while the GaoKannanOhViswanath algorithm selects the maximum k-th nearest distances among the two marginal spaces, which are in general not the same as the k-th neighbor distance in the joint space (unless both marginals are univariate). Therefore, our implementation here differs slightly from algorithm 1 in GaoKannanOhViswanath. We have modified it in a way that mimics KraskovStögbauerGrassberger1 for continous data. Note that because of using the log function instead of digamma, there will be slight differences between the methods. See the source code for more details.

Explicitly convert your discrete data to floats

Even though the GaoKannanOhViswanath estimator is designed to handle discrete data, our implementation demands that all input data are StateSpaceSets whose data points are floats. If you have discrete data, such as strings or symbols, encode them using integers and convert those integers to floats before passing them to association.

Examples

using CausalityTools
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000); y = rand(rng, 10000)
association(GaoKannanOhViswanath(; k = 10), x, y) # should be near 0 (and can be negative)
CausalityTools.GaoOhViswanathType
GaoOhViswanath <: MutualInformationEstimator

The GaoOhViswanath is a mutual information estimator based on nearest neighbors, and is also called the bias-improved-KSG estimator, or BI-KSG, by Gao2018.

Compatible definitions

Usage

  • Use with association to compute Shannon mutual information from input data.
  • Use with some IndependenceTest to test for independence between variables.

Description

The estimator is given by

\[\begin{align*} \hat{H}_{GAO}(X, Y) &= \hat{H}_{KSG}(X) + \hat{H}_{KSG}(Y) - \hat{H}_{KZL}(X, Y) \\ &= \psi{(k)} + \log{(N)} + \log{ \left( \dfrac{c_{d_{x}, 2} c_{d_{y}, 2}}{c_{d_{x} + d_{y}, 2}} \right) } - \\ & \dfrac{1}{N} \sum_{i=1}^N \left( \log{(n_{x, i, 2})} + \log{(n_{y, i, 2})} \right) \end{align*},\]

where $c_{d, 2} = \dfrac{\pi^{\frac{d}{2}}}{\Gamma{(\dfrac{d}{2} + 1)}}$ is the volume of a $d$-dimensional unit $\mathcal{l}_2$-ball.

Example

using CausalityTools
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000); y = rand(rng, 10000)
association(GaoOhViswanath(; k = 10), x, y) # should be near 0 (and can be negative)
CausalityTools.GaussianCMIType
GaussianCMI <: MutualInformationEstimator
GaussianCMI(definition = CMIShannon(); normalize::Bool = false)

GaussianCMI is a parametric ConditionalMutualInformationEstimator Vejmelka2008.

Compatible definitions

Usage

Description

GaussianCMI estimates Shannon CMI through a sum of two mutual information terms that each are estimated using GaussianMI (the normalize keyword is the same as for GaussianMI):

\[\hat{I}_{Gaussian}(X; Y | Z) = \hat{I}_{Gaussian}(X; Y, Z) - \hat{I}_{Gaussian}(X; Z)\]

Examples

CausalityTools.GaussianMIType
GaussianMI <: MutualInformationEstimator
GaussianMI(; normalize::Bool = false)

GaussianMI is a parametric estimator for Shannon mutual information.

Compatible definitions

Usage

  • Use with association to compute Shannon mutual information from input data.
  • Use with some IndependenceTest to test for independence between variables.

Description

Given $d_x$-dimensional and $d_y$-dimensional input data X and Y, GaussianMI first constructs the $d_x + d_y$-dimensional joint StateSpaceSet XY. If normalize == true, then we follow the approach in Vejmelka & Palus (2008)Vejmelka2008 and transform each column in XY to have zero mean and unit standard deviation. If normalize == false, then the algorithm proceeds without normalization.

Next, the C_{XY}, the correlation matrix for the (normalized) joint data XY is computed. The mutual information estimate GaussianMI assumes the input variables are distributed according to normal distributions with zero means and unit standard deviations. Therefore, given $d_x$-dimensional and $d_y$-dimensional input data X and Y, GaussianMI first constructs the joint StateSpaceSet XY, then transforms each column in XY to have zero mean and unit standard deviation, and finally computes the \Sigma, the correlation matrix for XY.

The mutual information estimated (for normalize == false) is then estimated as

\[\hat{I}^S_{Gaussian}(X; Y) = \dfrac{1}{2} \dfrac{ \det(\Sigma_X) \det(\Sigma_Y)) }{\det(\Sigma))},\]

where we $\Sigma_X$ and $\Sigma_Y$ appear in $\Sigma$ as

\[\Sigma = \begin{bmatrix} \Sigma_{X} & \Sigma^{'}\\ \Sigma^{'} & \Sigma_{Y} \end{bmatrix}.\]

If normalize == true, then the mutual information is estimated as

\[\hat{I}^S_{Gaussian}(X; Y) = -\dfrac{1}{2} \sum_{i = 1}^{d_x + d_y} \sigma_i,\]

where $\sigma_i$ are the eigenvalues for $\Sigma$.

Example

using CausalityTools
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000); y = rand(rng, 10000)
association(GaussianMI(), x, y) # should be near 0 (and can be negative)
CausalityTools.GraphAlgorithmType
GraphAlgorithm

The supertype of all causal graph inference algorithms.

Concrete implementations

  • OCE. The optimal causation entropy algorithm for time series graphs.
  • PC.
CausalityTools.HMeasureType
HMeasure <: AssociationMeasure
HMeasure(; K::Int = 2, dx = 2, dy = 2, τx = - 1, τy = -1, w = 0)

The HMeasure Arnhold1999 is a pairwise association measure. It quantifies the probability with which close state of a target timeseries/embedding are mapped to close states of a source timeseries/embedding.

Note that τx and τy are negative by convention. See docstring for SMeasure for an explanation.

Usage

  • Use with association to compute the raw h-measure statistic.
  • Use with independence to perform a formal hypothesis test for directional dependence.

Description

The HMeasure Arnhold1999 is similar to the SMeasure, but the numerator of the formula is replaced by $R_i(x)$, the mean squared Euclidean distance to all other points, and there is a $\log$-term inside the sum:

\[H^{(k)}(x|y) = \dfrac{1}{N} \sum_{i=1}^{N} \log \left( \dfrac{R_i(x)}{R_i^{(k)}(x|y)} \right).\]

Parameters are the same and $R_i^{(k)}(x|y)$ is computed as for SMeasure.

See also: ClosenessMeasure.

CausalityTools.HellingerDistanceType
HellingerDistance <: DivergenceOrDistance

The Hellinger distance.

Usage

  • Use with association to compute the compute the Hellinger distance between two pre-computed probability distributions, or from raw data using one of the estimators listed below.

Compatible estimators

Description

The Hellinger distance between two probability distributions $P_X = (p_x(\omega_1), \ldots, p_x(\omega_n))$ and $P_Y = (p_y(\omega_1), \ldots, p_y(\omega_m))$, both defined over the same OutcomeSpace $\Omega = \{\omega_1, \ldots, \omega_n \}$, is defined as

\[D_{H}(P_Y(\Omega) || P_Y(\Omega)) = \dfrac{1}{\sqrt{2}} \sum_{\omega \in \Omega} (\sqrt{p_x(\omega)} - \sqrt{p_y(\omega)})^2\]

Estimation

CausalityTools.HilbertType
Hilbert(est;
    source::InstantaneousSignalProperty = Phase(),
    target::InstantaneousSignalProperty = Phase(),
    cond::InstantaneousSignalProperty = Phase())
) <: TransferDifferentialEntropyEstimator

Compute transfer entropy on instantaneous phases/amplitudes of relevant signals, which are obtained by first applying the Hilbert transform to each signal, then extracting the phases/amplitudes of the resulting complex numbers Palus2014. Original time series are thus transformed to instantaneous phase/amplitude time series. Transfer entropy is then estimated using the provided est on those phases/amplitudes (use e.g. ValueBinning, or OrdinalPatterns).

Info

Details on estimation of the transfer entropy (conditional mutual information) following the phase/amplitude extraction step is not given in Palus (2014). Here, after instantaneous phases/amplitudes have been obtained, these are treated as regular time series, from which transfer entropy is then computed as usual.

See also: Phase, Amplitude.

CausalityTools.JDDTestResultType
JDDTestResult(Δjdd, hypothetical_μ, pvalue)

Holds the results of JointDistanceDistributionTest. Δjdd is the Δ-distribution, hypothetical_μ is the hypothetical mean of the Δ-distribution under the null, and pvalue is the p-value for the one-sided t-test.

CausalityTools.JointDistanceDistributionType
JointDistanceDistribution <: AssociationMeasure end
JointDistanceDistribution(; metric = Euclidean(), B = 10, D = 2, τ = -1, μ = 0.0)

The joint distance distribution (JDD) measure Amigo2018.

Usage

  • Use with association to compute the joint distance distribution measure Δ from Amigo2018.
  • Use with independence to perform a formal hypothesis test for directional dependence.

Keyword arguments

  • distance_metric::Metric: An instance of a valid distance metric from Distances.jl. Defaults to Euclidean().
  • B::Int: The number of equidistant subintervals to divide the interval [0, 1] into when comparing the normalised distances.
  • D::Int: Embedding dimension.
  • τ::Int: Embedding delay. By convention, τ is negative.
  • μ: The hypothetical mean value of the joint distance distribution if there is no coupling between x and y (default is μ = 0.0).

Description

From input time series $x(t)$ and $y(t)$, we first construct the delay embeddings (note the positive sign in the embedding lags; therefore the input parameter τ is by convention negative).

\[\begin{align*} \{\bf{x}_i \} &= \{(x_i, x_{i+\tau}, \ldots, x_{i+(d_x - 1)\tau}) \} \\ \{\bf{y}_i \} &= \{(y_i, y_{i+\tau}, \ldots, y_{i+(d_y - 1)\tau}) \} \\ \end{align*}\]

The algorithm then proceeds to analyze the distribution of distances between points of these embeddings, as described in Amigo2018.

Examples

CausalityTools.JointDistanceDistributionTestType
JointDistanceDistributionTest <: IndependenceTest
JointDistanceDistributionTest(measure::JointDistanceDistribution; rng = Random.default_rng())

An independence test for two variables based on the JointDistanceDistribution Amigo2018.

When used with independence, a JDDTestResult is returned.

Description

The joint distance distribution (labelled Δ in their paper) is used by Amigó & Hirata (2018) to detect directional couplings of the form $X \to Y$ or $Y \to X$. JointDistanceDistributionTest formulates their method as an independence test.

Formally, we test the hypothesis $H_0$ (the variables are independent) against $H_1$ (there is directional coupling between the variables). To do so, we use a right-sided/upper-tailed t-test to check mean of Δ is skewed towards positive value, i.e.

  • $H_0 := \mu(\Delta) = 0$
  • $H_1 := \mu(\Delta) > 0$.

When used with independence, a JDDTestResult is returned, which contains the joint distance distribution and a p-value. If you only need Δ, use association with a JointDistanceDistribution instance directly.

Examples

  • Example 1. Detecting (in)dependence in bidirectionally coupled logistic maps.
CausalityTools.JointEntropyRenyiType
JointEntropyRenyi <: JointEntropy
JointEntropyRenyi(; base = 2, q = 1.5)

The Rényi joint entropy measure Golshani2009.

Usage

  • Use with association to compute the Golshani-Rényi joint entropy between two variables.

Compatible estimators

Definition

Given two two discrete random variables $X$ and $Y$ with ranges $\mathcal{X}$ and $\mathcal{X}$, Golshani2009 defines the Rényi joint entropy as

\[H_q^R(X, Y) = \dfrac{1}{1-\alpha} \log \sum_{i = 1}^N p_i^q,\]

where $q > 0$ and $q != 1$.

Estimation

CausalityTools.JointEntropyShannonType
JointEntropyShannon <: JointEntropy
JointEntropyShannon(; base = 2)

The Shannon joint entropy measure CoverThomas1999.

Usage

  • Use with association to compute the Shannon joint entropy between two variables.

Compatible estimators

Definition

Given two two discrete random variables $X$ and $Y$ with ranges $\mathcal{X}$ and $\mathcal{X}$, CoverThomas1999 defines the Shannon joint entropy as

\[H^S(X, Y) = -\sum_{x\in \mathcal{X}, y \in \mathcal{Y}} p(x, y) \log p(x, y),\]

where we define $log(p(x, y)) := 0$ if $p(x, y) = 0$.

Estimation

CausalityTools.JointEntropyTsallisType
JointEntropyTsallis <: JointEntropy
JointEntropyTsallis(; base = 2, q = 1.5)

The Tsallis joint entropy definition from Furuichi2006.

Usage

  • Use with association to compute the Furuichi-Tsallis joint entropy between two variables.

Compatible estimators

Definition

Given two two discrete random variables $X$ and $Y$ with ranges $\mathcal{X}$ and $\mathcal{X}$, Furuichi2006 defines the Tsallis joint entropy as

\[H_q^T(X, Y) = -\sum_{x\in \mathcal{X}, y \in \mathcal{Y}} p(x, y)^q \log_q p(x, y),\]

where $log_q(x, q) = \dfrac{x^{1-q} - 1}{1-q}$ is the q-logarithm, and we define $log_q(x, q) := 0$ if $q = 0$.

Estimation

CausalityTools.JointProbabilitiesType
JointProbabilities <: InformationMeasureEstimator
JointProbabilities(
    definition::MultivariateInformationMeasure,
    discretization::Discretization
)

JointProbabilities is a generic estimator for multivariate discrete information measures.

Usage

  • Use with association to compute an information measure from input data.

Description

It first encodes the input data according to the given discretization, then constructs probs, a multidimensional Probabilities instance. Finally, probs are forwarded to a PlugIn estimator, which computes the measure according to definition.

Compatible encoding schemes

  • CodifyVariables (encode each variable/column of the input data independently by applying an encoding in a sliding window over each input variable).
  • CodifyPoints (encode each point/column of the input data)

Works for any OutcomeSpace that implements codify.

Joint probabilities vs decomposition methods

Using JointProbabilities to compute an information measure, e.g. conditional mutual estimation, is typically slower than other dedicated estimation procedures like EntropyDecomposition. The reason is that measures such as CMIShannon can be formulated as a sum of four entropies, which can be estimated individually and summed afterwards. This decomposition is fast because because we avoid explicitly estimating the entire joint pmf, which demands many extra calculation steps, However, the decomposition is biased, because it fails to fully take into consideration the joint relationships between the variables. Pick your estimator according to your needs.

See also: Counts, Probabilities, ProbabilitiesEstimator, OutcomeSpace, DiscreteInfoEstimator.

CausalityTools.KLDivergenceType
KLDivergence <: DivergenceOrDistance

The Kullback-Leibler (KL) divergence.

Usage

  • Use with association to compute the compute the KL-divergence between two pre-computed probability distributions, or from raw data using one of the estimators listed below.

Compatible estimators

Estimators

Description

The KL-divergence between two probability distributions $P_X = (p_x(\omega_1), \ldots, p_x(\omega_n))$ and $P_Y = (p_y(\omega_1), \ldots, p_y(\omega_m))$, both defined over the same OutcomeSpace $\Omega = \{\omega_1, \ldots, \omega_n \}$, is defined as

\[D_{KL}(P_Y(\Omega) || P_Y(\Omega)) = \sum_{\omega \in \Omega} p_x(\omega) \log\dfrac{p_x(\omega)}{p_y(\omega)}\]

Implements

Note

Distances.jl also defines KLDivergence. Quality it if you're loading both packages, i.e. do association(CausalityTools.KLDivergence(), x, y).

Estimation

CausalityTools.KraskovStögbauerGrassberger1Type
KSG1 <: MutualInformationEstimator
KraskovStögbauerGrassberger1 <: MutualInformationEstimator
KraskovStögbauerGrassberger1(; k::Int = 1, w = 0, metric_marginals = Chebyshev())

The KraskovStögbauerGrassberger1 mutual information estimator (you can use KSG1 for short) is the $I^{(1)}$ k-th nearest neighbor estimator from Kraskov2004.

Compatible definitions

Usage

  • Use with association to compute Shannon mutual information from input data.
  • Use with some IndependenceTest to test for independence between variables.

Keyword arguments

  • k::Int: The number of nearest neighbors to consider. Only information about the k-th nearest neighbor is actually used.
  • metric_marginals: The distance metric for the marginals for the marginals can be any metric from Distances.jl. It defaults to metric_marginals = Chebyshev(), which is the same as in Kraskov et al. (2004).
  • w::Int: The Theiler window, which determines if temporal neighbors are excluded during neighbor searches in the joint space. Defaults to 0, meaning that only the point itself is excluded.

Example

using CausalityTools
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000); y = rand(rng, 10000)
association(KSG1(; k = 10), x, y) # should be near 0 (and can be negative)
CausalityTools.KraskovStögbauerGrassberger2Type
KSG2 <: MutualInformationEstimator
KraskovStögbauerGrassberger2 <: MutualInformationEstimator
KraskovStögbauerGrassberger2(; k::Int = 1, w = 0, metric_marginals = Chebyshev())

The KraskovStögbauerGrassberger2 mutual information estimator (you can use KSG2 for short) is the $I^{(2)}$ k-th nearest neighbor estimator from Kraskov2004.

Compatible definitions

Usage

  • Use with association to compute Shannon mutual information from input data.
  • Use with some IndependenceTest to test for independence between variables.

Keyword arguments

  • k::Int: The number of nearest neighbors to consider. Only information about the k-th nearest neighbor is actually used.
  • metric_marginals: The distance metric for the marginals for the marginals can be any metric from Distances.jl. It defaults to metric_marginals = Chebyshev(), which is the same as in Kraskov et al. (2004).
  • w::Int: The Theiler window, which determines if temporal neighbors are excluded during neighbor searches in the joint space. Defaults to 0, meaning that only the point itself is excluded.

Description

Let the joint StateSpaceSet $X := \{\bf{X}_1, \bf{X_2}, \ldots, \bf{X}_m \}$ be defined by the concatenation of the marginal StateSpaceSets $\{ \bf{X}_k \}_{k=1}^m$, where each $\bf{X}_k$ is potentially multivariate. Let $\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N$ be the points in the joint space $X$.

The KraskovStögbauerGrassberger2 estimator first locates, for each $\bf{x}_i \in X$, the point $\bf{n}_i \in X$, the k-th nearest neighbor to $\bf{x}_i$, according to the maximum norm (Chebyshev metric). Let $\epsilon_i$ be the distance $d(\bf{x}_i, \bf{n}_i)$.

Consider $x_i^m \in \bf{X}_m$, the $i$-th point in the marginal space $\bf{X}_m$. For each $\bf{x}_i^m$, we determine $\theta_i^m$ := the number of points $\bf{x}_k^m \in \bf{X}_m$ that are a distance less than $\epsilon_i$ away from $\bf{x}_i^m$. That is, we use the distance from a query point $\bf{x}_i \in X$ (in the joint space) to count neighbors of $x_i^m \in \bf{X}_m$ (in the marginal space).

Shannon mutual information between the variables $\bf{X}_1, \bf{X_2}, \ldots, \bf{X}_m$ is then estimated as

\[\hat{I}_{KSG2}(\bf{X}) = \psi{(k)} - \dfrac{m - 1}{k} + (m - 1)\psi{(N)} - \dfrac{1}{N} \sum_{i = 1}^N \sum_{j = 1}^m \psi{(\theta_i^j + 1)}\]

Example

using CausalityTools
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000); y = rand(rng, 10000)
association(KSG2(; k = 10), x, y) # should be near 0 (and can be negative)
CausalityTools.LMeasureType
LMeasure <: ClosenessMeasure
LMeasure(; K::Int = 2, dx = 2, dy = 2, τx = - 1, τy = -1, w = 0)

The LMeasure Chicharro2009 is a pairwise association measure. It quantifies the probability with which close state of a target timeseries/embedding are mapped to close states of a source timeseries/embedding.

Note that τx and τy are negative by convention. See docstring for SMeasure for an explanation.

Usage

  • Use with association to compute the raw L-measure statistic.
  • Use with independence to perform a formal hypothesis test for directional dependence.

Description

LMeasure is similar to MMeasure, but uses distance ranks instead of the raw distances.

Let $\bf{x_i}$ be an embedding vector, and let $g_{i,j}$ denote the rank that the distance between $\bf{x_i}$ and some other vector $\bf{x_j}$ in a sorted ascending list of distances between $\bf{x_i}$ and $\bf{x_{i \neq j}}$ In other words, $g_{i,j}$ this is just the $N-1$ nearest neighbor distances sorted )

LMeasure is then defined as

\[L^{(k)}(x|y) = \dfrac{1}{N} \sum_{i=1}^{N} \log \left( \dfrac{G_i(x) - G_i^{(k)}(x|y)}{G_i(x) - G_i^k(x)} \right),\]

where $G_i(x) = \frac{N}{2}$ and $G_i^K(x) = \frac{k+1}{2}$ are the mean and minimal rank, respectively.

The $y$-conditioned mean rank is defined as

\[G_i^{(k)}(x|y) = \dfrac{1}{K}\sum_{j=1}^{K} g_{i,w_{i, j}},\]

where $w_{i,j}$ is the index of the $j$-th nearest neighbor of $\bf{y_i}$.

See also: ClosenessMeasure.

CausalityTools.LindnerType
Lindner <: TransferEntropyEstimator
Lindner(definition = Shannon(); k = 1, w = 0, base = 2)

The Lindner transfer entropy estimator Lindner2011, which is also used in the Trentool MATLAB toolbox, and is based on nearest neighbor searches.

Compatible definitions

Usage

Keyword parameters

w is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to 0, meaning that only the point itself is excluded when searching for neighbours).

The estimator can be used both for pairwise and conditional transfer entropy estimation.

Description

For a given points in the joint embedding space jᵢ, this estimator first computes the distance dᵢ from jᵢ to its k-th nearest neighbor. Then, for each point mₖ[i] in the k-th marginal space, it counts the number of points within radius dᵢ.

The Shannon transfer entropy is then computed as

\[TE_S(X \to Y) = \psi(k) + \dfrac{1}{N} \sum_{i}^n \left[ \sum_{k=1}^3 \left( \psi(m_k[i] + 1) \right) \right],\]

where the index k references the three marginal subspaces T, TTf and ST for which neighbor searches are performed. Here this estimator has been modified to allow for conditioning too (a simple modification to Lindner2011's equation 5 and 6).

Example

using CausalityTools
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000)
y = rand(rng, 10000) .+ x
z = rand(rng, 10000) .+ y
est = Lindner(TEShannon(), k = 10)
association(est, x, z, y) # should be near 0 (and can be negative)
CausalityTools.LocalPermutationTestType
LocalPermutationTest <: IndependenceTest
LocalPermutationTest(measure, [est];
    kperm::Int = 5,
    nshuffles::Int = 100,
    rng = Random.default_rng(),
    replace = true,
    w::Int = 0,
    show_progress = false)

LocalPermutationTest is a generic conditional independence test Runge2018LocalPerm for assessing whether two variables X and Y are conditionally independendent given a third variable Z (all of which may be multivariate).

When used with independence, a LocalPermutationTestResult is returned.

Description

This is a generic one-sided hypothesis test that checks whether X and Y are independent (given Z, if provided) based on resampling from a null distribution assumed to represent independence between the variables. The null distribution is generated by repeatedly shuffling the input data in some way that is intended to break any dependence between x and y, but preserve dependencies between x and z.

The algorithm is as follows:

  1. Compute the original conditional independence statistic I(X; Y | Z).
  2. Allocate a scalar valued vector with space for nshuffles elements.
  3. For k ∈ [1, 2, …, nshuffles], repeat
    • For each zᵢ ∈ Y, let nᵢ be time indices of the kperm nearest neighbors of zᵢ, excluding the w nearest neighbors of zᵢ from the neighbor query (i.e w is the Theiler window).
    • Let xᵢ⋆ = X[j], where j is randomly sampled from nᵢ with replacement. This way, xᵢ is replaced with xⱼ only if zᵢ ≈ zⱼ (zᵢ and zⱼ are close). Repeat for i = 1, 2, …, n and obtain the shuffled X̂ = [x̂₁, x̂₂, …, x̂ₙ].
    • Compute the conditional independence statistic Iₖ(X̂; Y | Z).
    • Let Î[k] = Iₖ(X̂; Y | Z).
  4. Compute the p-value as count(Î[k] .<= I) / nshuffles).

In additional to the conditional variant from Runge (2018), we also provide a pairwise version, where the shuffling procedure is identical, except neighbors in Y are used instead of Z and we I(X; Y) and Iₖ(X̂; Y) instead of I(X; Y | Z) and Iₖ(X̂; Y | Z).

Compatible measures

MeasurePairwiseConditionalRequires estNote
PartialCorrelationNo
DistanceCorrelationNo
CMIShannonYes
TEShannonYesPairwise tests not possible with TransferEntropyEstimators, only lower-level estimators, e.g. FPVP, GaussianMI or Kraskov
PartialMutualInformationYes

The LocalPermutationTest is only defined for conditional independence testing. Exceptions are for measures like TEShannon, which use conditional measures under the hood even for their pairwise variants, and are therefore compatible with LocalPermutationTest.

The nearest-neighbor approach in Runge (2018) can be reproduced by using the CMIShannon measure with the FPVP estimator.

Examples

CausalityTools.LocalPermutationTestResultType
LocalPermutationTestResult(m, m_surr, pvalue)

Holds the result of a LocalPermutationTest. m is the measure computed on the original data. m_surr is a vector of the measure computed on permuted data, where m_surr[i] is the measure compute on the i-th permutation. pvalue is the one-sided p-value for the test.

CausalityTools.MCRType
MCR <: AssociationMeasure
MCR(; r, metric = Euclidean())

An association measure based on mean conditional probabilities of recurrence (MCR) introduced by Romano2007.

Usage

  • Use with association to compute the raw MCR for pairwise or conditional association.
  • Use with IndependenceTest to perform a formal hypothesis test for pairwise or conditional association.

Description

r is mandatory keyword which specifies the recurrence threshold when constructing recurrence matrices. It can be instance of any subtype of AbstractRecurrenceType from RecurrenceAnalysis.jl. To use any r that is not a real number, you have to do using RecurrenceAnalysis first. The metric is any valid metric from Distances.jl.

For input variables X and Y, the conditional probability of recurrence is defined as

\[M(X | Y) = \dfrac{1}{N} \sum_{i=1}^N p(\bf{y_i} | \bf{x_i}) = \dfrac{1}{N} \sum_{i=1}^N \dfrac{\sum_{i=1}^N J_{R_{i, j}}^{X, Y}}{\sum_{i=1}^N R_{i, j}^X},\]

where $R_{i, j}^X$ is the recurrence matrix and $J_{R_{i, j}}^{X, Y}$ is the joint recurrence matrix, constructed using the given metric. The measure $M(Y | X)$ is defined analogously.

Romano2007's interpretation of this quantity is that if X drives Y, then M(X|Y) > M(Y|X), if Y drives X, then M(Y|X) > M(X|Y), and if coupling is symmetric, then M(Y|X) = M(X|Y).

Input data

X and Y can be either both univariate timeseries, or both multivariate StateSpaceSets.

Estimation

CausalityTools.MIDecompositionType
MIDecomposition(definition::MultivariateInformationMeasure, 
    est::MutualInformationEstimator)

Estimate the MultivariateInformationMeasure specified by definition by by decomposing, the measure, if possible, into a combination of mutual information terms. These terms are individually estimated using the given MutualInformationEstimator est, and finally combined to form the final value of the measure.

Usage

Examples

See also: MultivariateInformationMeasureEstimator.

CausalityTools.MIRenyiJizbaType
MIRenyiJizba <: <: BivariateInformationMeasure
MIRenyiJizba(; q = 1.5, base = 2)

The Rényi mutual information $I_q^{R_{J}}(X; Y)$ defined in Jizba2012.

Usage

  • Use with association to compute the raw Rényi-Jizba mutual information from input data using of of the estimators listed below.
  • Use with independence to perform a formal hypothesis test for pairwise dependence using the Rényi-Jizba mutual information.

Compatible estimators

Definition

\[I_q^{R_{J}}(X; Y) = H_q^{R}(X) + H_q^{R}(Y) - H_q^{R}(X, Y),\]

where $H_q^{R}(\cdot)$ is the Rényi entropy.

Estimation

CausalityTools.MIRenyiSarbuType
MIRenyiSarbu <: BivariateInformationMeasure
MIRenyiSarbu(; base = 2, q = 1.5)

The discrete Rényi mutual information from Sarbu2014.

Usage

  • Use with association to compute the raw Rényi-Sarbu mutual information from input data using of of the estimators listed below.
  • Use with independence to perform a formal hypothesis test for pairwise dependence using the Rényi-Sarbu mutual information.

Compatible estimators

Description

Sarbu (2014) defines discrete Rényi mutual information as the Rényi $\alpha$-divergence between the conditional joint probability mass function $p(x, y)$ and the product of the conditional marginals, $p(x) \cdot p(y)$:

\[I(X, Y)^R_q = \dfrac{1}{q-1} \log \left( \sum_{x \in X, y \in Y} \dfrac{p(x, y)^q}{\left( p(x)\cdot p(y) \right)^{q-1}} \right)\]

Estimation

CausalityTools.MIShannonType
MIShannon <: BivariateInformationMeasure
MIShannon(; base = 2)

The Shannon mutual information $I_S(X; Y)$.

Usage

  • Use with association to compute the raw Shannon mutual information from input data using of of the estimators listed below.
  • Use with independence to perform a formal hypothesis test for pairwise dependence using the Shannon mutual information.

Compatible estimators

Discrete definition

There are many equivalent formulations of discrete Shannon mutual information, meaning that it can be estimated in several ways, either using JointProbabilities (double-sum formulation), EntropyDecomposition (three-entropies decomposition), or some dedicated estimator.

Double sum formulation

Assume we observe samples $\bar{\bf{X}}_{1:N_y} = \{\bar{\bf{X}}_1, \ldots, \bar{\bf{X}}_n \}$ and $\bar{\bf{Y}}_{1:N_x} = \{\bar{\bf{Y}}_1, \ldots, \bar{\bf{Y}}_n \}$ from two discrete random variables $X$ and $Y$ with finite supports $\mathcal{X} = \{ x_1, x_2, \ldots, x_{M_x} \}$ and $\mathcal{Y} = y_1, y_2, \ldots, x_{M_y}$. The double-sum estimate is obtained by replacing the double sum

\[\hat{I}_{DS}(X; Y) = \sum_{x_i \in \mathcal{X}, y_i \in \mathcal{Y}} p(x_i, y_j) \log \left( \dfrac{p(x_i, y_i)}{p(x_i)p(y_j)} \right)\]

where $\hat{p}(x_i) = \frac{n(x_i)}{N_x}$, $\hat{p}(y_i) = \frac{n(y_j)}{N_y}$, and $\hat{p}(x_i, x_j) = \frac{n(x_i)}{N}$, and $N = N_x N_y$. This definition is used by association when called with a JointProbabilities estimator.

Three-entropies formulation

An equivalent formulation of discrete Shannon mutual information is

\[I^S(X; Y) = H^S(X) + H_q^S(Y) - H^S(X, Y),\]

where $H^S(\cdot)$ and $H^S(\cdot, \cdot)$ are the marginal and joint discrete Shannon entropies. This definition is used by association when called with a EntropyDecomposition estimator and a discretization.

Differential mutual information

One possible formulation of differential Shannon mutual information is

\[I^S(X; Y) = h^S(X) + h_q^S(Y) - h^S(X, Y),\]

where $h^S(\cdot)$ and $h^S(\cdot, \cdot)$ are the marginal and joint differential Shannon entropies. This definition is used by association when called with EntropyDecomposition estimator and a DifferentialInfoEstimator.

Estimation

CausalityTools.MITsallisFuruichiType
MITsallisFuruichi <: BivariateInformationMeasure
MITsallisFuruichi(; base = 2, q = 1.5)

The discrete Tsallis mutual information from Furuichi (2006)Furuichi2006, which in that paper is called the mutual entropy.

Usage

  • Use with association to compute the raw Tsallis-Furuichi mutual information from input data using of of the estimators listed below.
  • Use with independence to perform a formal hypothesis test for pairwise dependence using the Tsallis-Furuichi mutual information.

Compatible estimators

Description

Furuichi's Tsallis mutual entropy between variables $X \in \mathbb{R}^{d_X}$ and $Y \in \mathbb{R}^{d_Y}$ is defined as

\[I_q^T(X; Y) = H_q^T(X) - H_q^T(X | Y) = H_q^T(X) + H_q^T(Y) - H_q^T(X, Y),\]

where $H^T(\cdot)$ and $H^T(\cdot, \cdot)$ are the marginal and joint Tsallis entropies, and q is the Tsallis-parameter.

Estimation

CausalityTools.MITsallisMartinType
MITsallisMartin <: BivariateInformationMeasure
MITsallisMartin(; base = 2, q = 1.5)

The discrete Tsallis mutual information from Martin2004.

Usage

  • Use with association to compute the raw Tsallis-Martin mutual information from input data using of of the estimators listed below.
  • Use with independence to perform a formal hypothesis test for pairwise dependence using the Tsallis-Martin mutual information.

Compatible estimators

Description

Martin et al.'s Tsallis mutual information between variables $X \in \mathbb{R}^{d_X}$ and $Y \in \mathbb{R}^{d_Y}$ is defined as

\[I_{\text{Martin}}^T(X, Y, q) := H_q^T(X) + H_q^T(Y) - (1 - q) H_q^T(X) H_q^T(Y) - H_q(X, Y),\]

where $H^S(\cdot)$ and $H^S(\cdot, \cdot)$ are the marginal and joint Shannon entropies, and q is the Tsallis-parameter.

Estimation

CausalityTools.MMeasureType
MMeasure <: ClosenessMeasure
MMeasure(; K::Int = 2, dx = 2, dy = 2, τx = - 1, τy = -1, w = 0)

The MMeasure Andrzejak2003 is a pairwise association measure. It quantifies the probability with which close state of a target timeseries/embedding are mapped to close states of a source timeseries/embedding.

Note that τx and τy are negative by convention. See docstring for SMeasure for an explanation.

Usage

  • Use with association to compute the raw m-measure statistic.
  • Use with independence to perform a formal hypothesis test for directional dependence.

Description

The MMeasure is based on SMeasure and HMeasure. It is given by

\[M^{(k)}(x|y) = \dfrac{1}{N} \sum_{i=1}^{N} \log \left( \dfrac{R_i(x) - R_i^{(k)}(x|y)}{R_i(x) - R_i^k(x)} \right),\]

where $R_i(x)$ is computed as for HMeasure, while $R_i^k(x)$ and $R_i^{(k)}(x|y)$ is computed as for SMeasure. Parameters also have the same meaning as for SMeasure/HMeasure.

See also: ClosenessMeasure.

CausalityTools.MesnerShaliziType
MesnerShalizi <: ConditionalMutualInformationEstimator
MesnerShalizi(definition = CMIShannon(); k = 1, w = 0)

The MesnerShalizi ConditionalMutualInformationEstimator is designed for data that can be mixtures of discrete and continuous data Mesner2020.

k is the number of nearest neighbors. w is the Theiler window, which controls the number of temporal neighbors that are excluded during neighbor searches.

Compatible definitions

Usage

Examples

using CausalityTools
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000)
y = rand(rng, 10000) .+ x
z = rand(rng, 10000) .+ y
association(MesnerShalizi(; k = 10), x, z, y) # should be near 0 (and can be negative)
CausalityTools.MultivariateInformationMeasureType
MultivariateInformationMeasure <: AssociationMeasure

The supertype for all multivariate information-based measure definitions.

Definition

Following Datseris2024, we define a multivariate information measure as any functional of a multidimensional probability mass functions (PMFs) or multidimensional probability density.

Implementations

JointEntropy definitions:

ConditionalEntropy definitions:

DivergenceOrDistance definitions:

MutualInformation definitions:

ConditionalMutualInformation definitions:

TransferEntropy definitions:

Other definitions:

CausalityTools.MultivariateInformationMeasureEstimatorType
CausalityTools.OCEType
OCE <: GraphAlgorithm
OCE(; utest::IndependenceTest = SurrogateAssociationTest(MIShannon(), KSG2(k = 3, w = 3)),
      ctest::C = LocalPermutationTest(CMIShannon(), MesnerShalizi(k = 3, w = 3)),
      τmax::T = 1, α = 0.05)

The optimal causation entropy (OCE) algorithm for causal discovery Sun2015.

Description

The OCE algorithm has three steps to determine the parents of a variable xᵢ.

  1. Perform pairwise independence tests using utest and select the variable xⱼ(-τ) that has the highest significant (i.e. with associated p-value below α) association with xᵢ(0). Assign it to the set of selected parents P.
  2. Perform conditional independence tests using ctest, finding the parent Pₖ that has the highest association with xᵢ given the already selected parents, and add it to P. Repeat until no more variables with significant association are found.
  3. Backwards elimination of parents Pₖ of xᵢ(0) for which xᵢ(0) ⫫ Pₖ | P - {Pₖ}, where P is the set of parent nodes found in the previous steps.

τmax indicates the maximum lag τ between the target variable xᵢ(0) and its potential parents xⱼ(-τ). Sun et al. 2015's method is based on τmax = 1.

Returns

When used with infer_graph, it returns a vector p, where p[i] are the parents for each input variable. This result can be converted to a SimpleDiGraph from Graphs.jl (see example).

Usage

OCE is used with infer_graph to infer the parents of the input data. Input data must either be a Vector{Vector{<:Real}}, or a StateSpaceSet.

Examples

CausalityTools.OCESelectedParentsType
OCESelectedParents

A simple struct for storing the parents of a single variable xᵢ inferred by the OCE algorithm. When using OCE with infer_graph, a Vector{OCESelectedParents} is returned - one per variable in the input data.

Assumptions and notation

Assumes the input x is a Vector{Vector{<:Real}} or a StateSpaceSet (for which each column is treated as a variable). It contains the following fields, where we use the notation xₖ(τ) to indicate the k-th variable lagged by time-lag τ. For example, x₂(-3) is the variable x[2] lagged by 3 time steps.

Fields

  • i: The index of the target variable (i.e. xᵢ(0) is the target).
  • all_idxs: The possible variable indices of parent variables (i.e. 1:M, where M is the number of input variables).
  • parents_js: The variable indices of the selected parent variables –- one per selected parent.
  • parents_τs: The lags for the selected parent variables –- one per selected parent.
  • parents: A vector containing the raw, time-lagged data for each selected parent variables. Let τ = parents_τs[k] and j = parents_js[k]. Then parents[k] is the raw data for the variable xⱼ(-τ).
CausalityTools.OptimiseTraditionalType
OptimiseTraditional(; η = 1, maxlag = 50, maxdim = 10,
    method = delay_ifnn, dmethod = "mi_min")

Optimize embedding parameters using traditional delay embedding optimization methods with a maximum lag of maxlag and a maximum dimension of maxdim. method can be either delay_ifnn, delay_fnn or delay_f1nn.

CausalityTools.PCType
PC <: GraphAlgorithm
PC(pairwise_test, conditional_test;
    α = 0.05, max_depth = Inf, maxiters_orient = Inf)

The PC algorithm Spirtes2000, which is named named after the first names of the authors, Peter Spirtes and Clark Glymour, which is implemented as described in Kalisch2008.

Arguments

Keyword arguments

  • α::Real. The significance level of the test.
  • max_depth. The maximum level of conditional indendence tests to be performed. By default, there is no limit (i.e. max_depth = Inf), meaning that maximum depth is N - 2, where N is the number of input variables.
  • maxiters_orient::Real. The maximum number of times to apply the orientation rules. By default, there is not limit (i.e. maxiters_orient = Inf).
Directional measures will not give meaningful answers

During the skeleton search phase, if a significance association between two nodes are is found, then a bidirectional edge is drawn between them. The generic implementation of PC therefore doesn't currently handle directional measures such as TEShannon. The reason is that if a directional relationship X → Y exists between two nodes X and Y, then the algorithm would first draw a bidirectional arrow between X and Y when analysing the direction X → Y, and then removing it again when analysing in the direction Y → X (a similar situation would also occur for the conditional stage). This will be fixed in a future release. For now, use nondirectional measures, e.g. MIShannon and CMIShannon!

Description

When used with infer_graph on some input data x, the PC algorithm performs the following steps:

  1. Initialize an empty fully connected graph g with N nodes, where N is the number of variables and x[i] is the data for the i-th node.
  2. Reduce the fully connected g to a skeleton graph by performing pairwise independence tests between all vertices using pairwise_test. Remove any edges where adjacent vertices are found to be independent according to the test (i.e. the null hypothesis of independence cannot be rejected at significance level 1 - α).
  3. Thin the skeleton g by conditional independence testing. If x[i] ⫫ x[j] | x[Z] for some set of variables Z (not including i and j) according to conditional_test (i.e. the null hypothesis of conditional independence cannot be rejected at significance level 1 - α), then the edge between i and j is removed, and we record the separating set S(i, j) = Z. Independence tests are first performed for conditioning sets of size 1, and repeated for conditioning sets of increasing size, which in most cases limits the number of tests needed. The separating sets S(i, j), which records which variables were in the conditioning set that rendered variables i and j independent, are recorded. If max_depth is an integer, then this procedure is performed on conditioning sets of sizes 1:max_depth, and if max_depth == nothing, then all possible conditioning set sizes are potentially used.
  4. Create a directed graph dg from g by replacing every undirected edge X - Y in g by the bidirectional edge X ↔ Y (i.e. construct two directional edges X → Y and Y → X). Orientiation rules 0-3 are then repeatedly applied to dg until no more edges can be oriented:
    • Rule 0 (orients v-structures): X ↔ Y ↔ Z becomes X → Y ← Z if Y is not in the separating set S(X, Z).
    • Rule 1 (prevents new v-structures): X → Y ↔ Z becomes X → Y → Z if X and Z are not adjacent.
    • Rule 2 (avoids cycles): X → Y → Z ↔ X becomes X → Y → Z ← X.
    • Rule 3: To avoid creating cycles or new v-structures, whenever X - Y → Z, X - W → Z, and X - Z but there is no edge between Y and W, turn the undirected X - Z edge into the directed edge X → Z.

The resulting directed graph (a SimpleDiGraph from Graphs.jl) is then returned.

Examples

CausalityTools.PairwiseAsymmetricInferenceType
PairwiseAsymmetricInference <: CrossmapMeasure
PairwiseAsymmetricInference(; d::Int = 2, τ::Int = -1, w::Int = 0,
    f = Statistics.cor, embed_warn = true)

The pairwise asymmetric inference (PAI) measure McCracken2014 is a version of ConvergentCrossMapping that searches for neighbors in mixed embeddings (i.e. both source and target variables included); otherwise, the algorithms are identical.

Usage

  • Use with association to compute the pairwise asymmetric inference measure between variables.

Compatible estimators

Description

The Theiler window w controls how many temporal neighbors are excluded during neighbor searches (w = 0 means that only the point itself is excluded). f is a function that computes the agreement between observations and predictions (the default, f = Statistics.cor, gives the Pearson correlation coefficient).

Embedding

There are many possible ways of defining the embedding for PAI. Currently, we only implement the "add one non-lagged source timeseries to an embedding of the target" approach, which is used as an example in McCracken & Weigel's paper. Specifically: Let S(i) be the source time series variable and T(i) be the target time series variable. PairwiseAsymmetricInference produces regular embeddings with fixed dimension d and embedding lag τ as follows:

\[(S(i), T(i+(d-1)\tau, \ldots, T(i+2\tau), T(i+\tau), T(i)))_{i=1}^{N-(d-1)\tau}.\]

In this joint embedding, neighbor searches are performed in the subspace spanned by the first D variables, while the last variable is to be predicted.

With this convention, τ < 0 implies "past/present values of source used to predict target", and τ > 0 implies "future/present values of source used to predict target". The latter case may not be meaningful for many applications, so by default, a warning will be given if τ > 0 (embed_warn = false turns off warnings).

Estimation

CausalityTools.PartialCorrelationType
PartialCorrelation <: AssociationMeasure

The correlation of two variables, with the effect of a set of conditioning variables removed.

Usage

  • Use with association to compute the raw partial correlation coefficient.
  • Use with independence to perform a formal hypothesis test for correlated-based conditional independence.

Description

There are several ways of estimating the partial correlation. We follow the matrix inversion method, because for StateSpaceSets, we can very efficiently compute the required joint covariance matrix $\Sigma$ for the random variables.

Formally, let $X_1, X_2, \ldots, X_n$ be a set of $n$ real-valued random variables. Consider the joint precision matrix,$P = (p_{ij}) = \Sigma^-1$. The partial correlation of any pair of variables $(X_i, X_j)$, given the remaining variables $\bf{Z} = \{X_k\}_{i=1, i \neq i, j}^n$, is defined as

\[\rho_{X_i X_j | \bf{Z}} = -\dfrac{p_ij}{\sqrt{ p_{ii} p_{jj} }}\]

In practice, we compute the estimate

\[\hat{\rho}_{X_i X_j | \bf{Z}} = -\dfrac{\hat{p}_ij}{\sqrt{ \hat{p}_{ii} \hat{p}_{jj} }},\]

where $\hat{P} = \hat{\Sigma}^{-1}$ is the sample precision matrix.

CausalityTools.PartialMutualInformationType
PartialMutualInformation <: MultivariateInformationMeasure
PartialMutualInformation(; base = 2)

The partial mutual information (PMI) measure of conditional association Zhao2016.

Definition

PMI is defined as for variables $X$, $Y$ and $Z$ as

\[PMI(X; Y | Z) = D(p(x, y, z) || p^{*}(x|z) p^{*}(y|z) p(z)),\]

where $p(x, y, z)$ is the joint distribution for $X$, $Y$ and $Z$, and $D(\cdot, \cdot)$ is the extended Kullback-Leibler divergence from $p(x, y, z)$ to $p^{*}(x|z) p^{*}(y|z) p(z)$. See Zhao2016 for details.

Estimation

The PMI is estimated by first estimating a 3D probability mass function using probabilities, then computing $PMI(X; Y | Z)$ from those probaiblities.

Properties

For the discrete case, the following identities hold in theory (when estimating PMI, they may not).

  • PMI(X, Y, Z) >= CMI(X, Y, Z) (where CMI is the Shannon CMI). Holds in theory, but when estimating PMI, the identity may not hold.
  • PMI(X, Y, Z) >= 0. Holds both in theory and when estimating using discrete estimators.
  • X ⫫ Y | Z => PMI(X, Y, Z) = CMI(X, Y, Z) = 0 (in theory, but not necessarily for estimation).
CausalityTools.PearsonCorrelationType
PearsonCorrelation

The Pearson correlation of two variables.

Usage

  • Use with association to compute the raw Pearson correlation coefficient.
  • Use with independence to perform a formal hypothesis test for pairwise dependence using the Pearson correlation coefficient.

Description

The sample Pearson correlation coefficient for real-valued random variables $X$ and $Y$ with associated samples $\{x_i\}_{i=1}^N$ and $\{y_i\}_{i=1}^N$ is defined as

\[\rho_{xy} = \dfrac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y}) }{\sqrt{\sum_{i=1}^N (x_i - \bar{x})^2}\sqrt{\sum_{i=1}^N (y_i - \bar{y})^2}},\]

where $\bar{x}$ and $\bar{y}$ are the means of the observations $x_k$ and $y_k$, respectively.

CausalityTools.PhaseType
Phase <: InstantaneousSignalProperty

Indicates that the instantaneous phases of a signal should be used.

CausalityTools.PoczosSchneiderCMIType
PoczosSchneiderCMI <: ConditionalMutualInformationEstimator
PoczosSchneiderCMI(definition = CMIRenyiPoczos(); k = 1, w = 0)

The PoczosSchneiderCMI ConditionalMutualInformationEstimator computes conditional mutual informations using a k-th nearest neighbor approach Poczos2012.

k is the number of nearest neighbors. w is the Theiler window, which controls the number of temporal neighbors that are excluded during neighbor searches.

Compatible definitions

Usage

Examples

using CausalityTools
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000)
y = rand(rng, 10000) .+ x
z = rand(rng, 10000) .+ y
association(PoczosSchneiderCMI(CMIRenyiPoczos(), k = 10), x, z, y) # should be near 0 (and can be negative)
CausalityTools.RMCDType
RMCD <: AssociationMeasure
RMCD(; r, metric = Euclidean(), base = 2)

The recurrence measure of conditional dependence, or RMCD Ramos2017, is a recurrence-based measure that mimics the conditional mutual information, but uses recurrence probabilities.

Usage

  • Use with association to compute the raw RMCD for pairwise or conditional association.
  • Use with IndependenceTest to perform a formal hypothesis test for pairwise or conditional association.

Description

r is a mandatory keyword which specifies the recurrence threshold when constructing recurrence matrices. It can be instance of any subtype of AbstractRecurrenceType from RecurrenceAnalysis.jl. To use any r that is not a real number, you have to do using RecurrenceAnalysis first. The metric is any valid metric from Distances.jl.

Both the pairwise and conditional RMCD is non-negative, but due to round-off error, negative values may occur. If that happens, an RMCD value of 0.0 is returned.

Description

The RMCD measure is defined by

\[I_{RMCD}(X; Y | Z) = \dfrac{1}{N} \sum_{i} \left[ \dfrac{1}{N} \sum_{j} R_{ij}^{X, Y, Z} \log \left( \dfrac{\sum_{j} R_{ij}^{X, Y, Z} \sum_{j} R_{ij}^{Z} }{\sum_{j} \sum_{j} R_{ij}^{X, Z} \sum_{j} \sum_{j} R_{ij}^{Y, Z}} \right) \right],\]

where base controls the base of the logarithm. $I_{RMCD}(X; Y | Z)$ is zero when $Z = X$, $Z = Y$ or when $X$, $Y$ and $Z$ are mutually independent.

Our implementation allows dropping the third/last argument, in which case the following mutual information-like quantitity is computed (not discussed in Ramos2017.

\[I_{RMCD}(X; Y) = \dfrac{1}{N} \sum_{i} \left[ \dfrac{1}{N} \sum_{j} R_{ij}^{X, Y} \log \left( \dfrac{\sum_{j} R_{ij}^{X} R_{ij}^{Y} }{\sum_{j} R_{ij}^{X, Y}} \right) \right]\]

Estimation

CausalityTools.RahimzamaniType
Rahimzamani <: ConditionalMutualInformationEstimator
Rahimzamani(k = 1, w = 0)

The Rahimzamani ConditionalMutualInformationEstimator is designed for data that can be mixtures of discrete and continuous data Rahimzamani2018.

Compatible definitions

Usage

Description

This estimator is very similar to the GaoKannanOhViswanath mutual information estimator, but has been expanded to the conditional mutual information case.

k is the number of nearest neighbors. w is the Theiler window, which controls the number of temporal neighbors that are excluded during neighbor searches.

Examples

using CausalityTools
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000)
y = rand(rng, 10000) .+ x
z = rand(rng, 10000) .+ y
association(Rahimzamani(; k = 10), x, z, y) # should be near 0 (and can be negative)
CausalityTools.RandomSegmentType
RandomSegment <: CrossmapEstimator
RandomSegment(definition::CrossmapMeasure; libsizes::Int, rng = Random.default_rng())

Cross map once over N = length(libsizes) different "point libraries", where point indices are selected as time-contiguous segments with random starting points.

This is method 2 from Luo2015. See CrossmapEstimator for an in-depth explanation of what "library" means in this context.

Description

The cardinality of the point index segments are given by libsizes. One segment with a randomly selected starting point is picked per L ∈ libsizes, and the i-th point index segment has cardinality k = libsizes[i].

The starting point for each library is selected independently of other libraries. A user-specified rng may be specified for reproducibility. If the time series you're cross mapping between have length M, and Lᵢ < M for any Lᵢ ∈ libsizes, then an error will be thrown.

A user-specified rng may be specified for reproducibility.

Returns

The return type when used with association depends on the type of libsizes.

  • If libsizes is an Int (a single library), then a single cross-map estimate is returned.
  • If libsizes is an AbstractVector{Int} (multiple libraries), then a vector of cross-map estimates is returned –- one per library.

See also: CrossmapEstimator.

CausalityTools.RandomVectorsType
RandomVectors <: CrossmapEstimator
RandomVectors(definition::CrossmapMeasure; libsizes, replace = false, 
    rng = Random.default_rng())

Cross map once over N = length(libsizes) different "point libraries", where point indices are selected randomly (not considering time ordering).

This is method 3 from Luo2015. See CrossmapEstimator for an in-depth explanation of what "library" means in this context.

Description

The cardinality of the point libraries are given by libsizes. One set of random point indices is selected per L ∈ libsizes, and the i-th library has cardinality k = libsizes[i].

Point indices within each library are randomly selected, independently of other libraries. A user-specified rng may be specified for reproducibility. The replace argument controls whether sampling is done with or without replacement. If the time series you're cross mapping between have length M, and Lᵢ < M for any Lᵢ ∈ libsizes, then you must set replace = true.

Returns

The return type when used with association depends on the type of libsizes.

  • If libsizes is an Int (a single library), then a single cross-map estimate is returned.
  • If libsizes is an AbstractVector{Int} (multiple libraries), then a vector of cross-map estimates is returned –- one per library.

See also: CrossmapEstimator.

CausalityTools.RenyiDivergenceType
RenyiDivergence <: DivergenceOrDistance
RenyiDivergence(q; base = 2)

The Rényi divergence of positive order q.

Usage

  • Use with association to compute the compute the Rényi divergence between two pre-computed probability distributions, or from raw data using one of the estimators listed below.

Compatible estimators

Description

The Rényi divergence between two probability distributions $P_X = (p_x(\omega_1), \ldots, p_x(\omega_n))$ and $P_Y = (p_y(\omega_1), \ldots, p_y(\omega_m))$, both defined over the same OutcomeSpace $\Omega = \{\omega_1, \ldots, \omega_n \}$, is defined as vanErven2014.

\[D_{q}(P_Y(\Omega) || P_Y(\Omega)) = \dfrac{1}{q - 1} \log \sum_{\omega \in \Omega}p_x(\omega)^{q}p_y(\omega)^{1-\alpha}\]

Implements

Note

Distances.jl also defines RenyiDivergence. Quality it if you're loading both packages, i.e. do association(CausalityTools.RenyiDivergence(), x, y).

Estimation

CausalityTools.SMeasureType
SMeasure < ClosenessMeasure
SMeasure(; K::Int = 2, dx = 2, dy = 2, τx = - 1, τy = -1, w = 0)

SMeasure is a bivariate association measure from Arnhold1999 and Quiroga2000 that measure directional dependence between two input (potentially multivariate) time series.

Note that τx and τy are negative; see explanation below.

Usage

  • Use with association to compute the raw s-measure statistic.
  • Use with independence to perform a formal hypothesis test for directional dependence.

Description

The steps of the algorithm are:

  1. From input time series $x(t)$ and $y(t)$, construct the delay embeddings (note the positive sign in the embedding lags; therefore inputs parameters τx and τy are by convention negative).

\[\begin{align*} \{\bf{x}_i \} &= \{(x_i, x_{i+\tau_x}, \ldots, x_{i+(d_x - 1)\tau_x}) \} \\ \{\bf{y}_i \} &= \{(y_i, y_{i+\tau_y}, \ldots, y_{i+(d_y - 1)\tau_y}) \} \\ \end{align*}\]

  1. Let $r_{i,j}$ and $s_{i,j}$ be the indices of the K-th nearest neighbors of $\bf{x}_i$ and $\bf{y}_i$, respectively. Neighbors closed than w time indices are excluded during searches (i.e. w is the Theiler window).

  2. Compute the the mean squared Euclidean distance to the $K$ nearest neighbors for each $x_i$, using the indices $r_{i, j}$.

\[R_i^{(k)}(x) = \dfrac{1}{k} \sum_{i=1}^{k}(\bf{x}_i, \bf{x}_{r_{i,j}})^2\]

  • Compute the y-conditioned mean squared Euclidean distance to the $K$ nearest neighbors for each $x_i$, now using the indices $s_{i,j}$.

\[R_i^{(k)}(x|y) = \dfrac{1}{k} \sum_{i=1}^{k}(\bf{x}_i, \bf{x}_{s_{i,j}})^2\]

  • Define the following measure of independence, where $0 \leq S \leq 1$, and low values indicate independence and values close to one occur for synchronized signals.

\[S^{(k)}(x|y) = \dfrac{1}{N} \sum_{i=1}^{N} \dfrac{R_i^{(k)}(x)}{R_i^{(k)}(x|y)}\]

Input data

The algorithm is slightly modified from Arnhold1999 to allow univariate timeseries as input.

  • If x and y are StateSpaceSets then use x and y as is and ignore the parameters dx/τx and dy/τy.
  • If x and y are scalar time series, then create dx and dy dimensional embeddings, respectively, of both x and y, resulting in N different m-dimensional embedding points $X = \{x_1, x_2, \ldots, x_N \}$ and $Y = \{y_1, y_2, \ldots, y_N \}$. τx and τy control the embedding lags for x and y.
  • If x is a scalar-valued vector and y is a StateSpaceSet, or vice versa, then create an embedding of the scalar timeseries using parameters dx/τx or dy/τy.

In all three cases, input StateSpaceSets are length-matched by eliminating points at the end of the longest StateSpaceSet (after the embedding step, if relevant) before analysis.

See also: ClosenessMeasure.

CausalityTools.SurrogateAssociationTestType
SurrogateAssociationTest <: IndependenceTest
SurrogateAssociationTest(est_or_measure;
    nshuffles::Int = 100,
    surrogate = RandomShuffle(),
    rng = Random.default_rng(),
    show_progress = false,
)

A surrogate-data based generic (conditional) independence test for assessing whether the association between variables X and Y are independent, potentially conditioned on a third variable Z.

Compatible estimators and measures

  • Compatible with AssociationMeasures that measure some sort of pairwise or conditional association.
Note

You must yourself determine whether using a particular measure is meaningful, and what it means.

Note

If used with a TransferEntropy measure such as TEShannon, then the source variable is always shuffled, and the target and conditional variable are left unshuffled.

Usage

Description

This is a generic one-sided hypothesis test that checks whether x and y are independent (given z, if provided) based on resampling from a null distribution assumed to represent independence between the variables. The null distribution is generated by repeatedly shuffling the input data in some way that is intended to break any dependence between the input variables.

The test first estimates the desired statistic using est_or_measure on the input data. Then, the first input variable is shuffled nshuffled times according to the given surrogate method (each type of surrogate represents a distinct null hypothesis). For each shuffle, est_or_measure is recomputed and the results are stored.

Examples

CausalityTools.SurrogateAssociationTestResultType
SurrogateAssociationTestResult(m, m_surr, pvalue)

Holds the result of a SurrogateAssociationTest. m is the measure computed on the original data. m_surr is a vector of the measure computed on permuted data, where m_surr[i] is the measure compute on the i-th permutation. pvalue is the one-sided p-value for the test.

CausalityTools.SymbolicTransferEntropyType
SymbolicTransferEntropy <: TransferEntropyEstimator
SymbolicTransferEntropy(definition = TEShannon(); m = 3, τ = 1, 
    lt = ComplexityMeasures.isless_rand

A convenience estimator for symbolic transfer entropy Staniek2008.

Compatible measures

Description

Symbolic transfer entropy consists of two simple steps. First, the input time series are encoded using codify with the CodifyVariables discretization and the OrdinalPatterns outcome space. This transforms the input time series into integer time series. Transfer entropy entropy is then estimated from the encoded time series by applying

Transfer entropy is then estimated as usual on the encoded timeseries with the embedding dictated by definition and the JointProbabilities estimator.

Examples

CausalityTools.TERenyiJizbaType
TERenyiJizba() <: TransferEntropy

The Rényi transfer entropy from Jizba2012.

Usage

  • Use with association to compute the raw transfer entropy.
  • Use with an IndependenceTest to perform a formal hypothesis test for pairwise and conditional dependence.

Description

The transfer entropy from source $S$ to target $T$, potentially conditioned on $C$ is defined as

\[\begin{align*} TE(S \to T) &:= I_q^{R_J}(T^+; S^- | T^-) \\ TE(S \to T | C) &:= I_q^{R_J}(T^+; S^- | T^-, C^-), \end{align*},\]

where $I_q^{R_J}(T^+; S^- | T^-)$ is Jizba et al. (2012)'s definition of conditional mutual information (CMIRenyiJizba). The - and + subscripts on the marginal variables $T^+$, $T^-$, $S^-$ and $C^-$ indicate that the embedding vectors for that marginal are constructed using present/past values and future values, respectively.

Estimation

Estimating Jizba's Rényi transfer entropy is a bit complicated, since it doesn't have a dedicated estimator. Instead, we re-write the Rényi transfer entropy as a Rényi conditional mutual information, and estimate it using an EntropyDecomposition with a suitable discrete/differential Rényi entropy estimator from the list below as its input.

EstimatorSub-estimatorPrinciple
EntropyDecompositionLeonenkoProzantoSavaniFour-entropies decomposition
EntropyDecompositionValueBinningFour-entropies decomposition
EntropyDecompositionDispersionFour-entropies decomposition
EntropyDecompositionOrdinalPatternsFour-entropies decomposition
EntropyDecompositionUniqueElementsFour-entropies decomposition
EntropyDecompositionTransferOperatorFour-entropies decomposition

Any of these estimators must be given as input to a `CMIDecomposition estimator.

Estimation

CausalityTools.TEShannonType
TEShannon <: TransferEntropy
TEShannon(; base = 2; embedding = EmbeddingTE()) <: TransferEntropy

The Shannon-type transfer entropy measure.

Usage

  • Use with association to compute the raw transfer entropy.
  • Use with an IndependenceTest to perform a formal hypothesis test for pairwise and conditional dependence.

Description

The transfer entropy from source $S$ to target $T$, potentially conditioned on $C$ is defined as

\[\begin{align*} TE(S \to T) &:= I^S(T^+; S^- | T^-) \\ TE(S \to T | C) &:= I^S(T^+; S^- | T^-, C^-) \end{align*}\]

where $I(T^+; S^- | T^-)$ is the Shannon conditional mutual information (CMIShannon). The - and + subscripts on the marginal variables $T^+$, $T^-$, $S^-$ and $C^-$ indicate that the embedding vectors for that marginal are constructed using present/past values and future values, respectively.

Estimation

CausalityTools.TEVarsType
TEVars(Tf::Vector{Int}, T::Vector{Int}, S::Vector{Int})
TEVars(Tf::Vector{Int}, T::Vector{Int}, S::Vector{Int}, C::Vector{Int})
TEVars(;Tf = Int[], T = Int[], S = Int[], C = Int[]) -> TEVars

Which axes of the state space correspond to the future of the target (Tf), the present/past of the target (T), the present/past of the source (S), and the present/past of any conditioned variables (C)? This information is used by the transfer entropy estimators to ensure that marginal distributions are computed correctly.

Indices correspond to variables of the embedding, or, equivalently, colums of a StateSpaceSet.

  • The three-argument constructor assumes there will be no conditional variables.
  • The four-argument constructor assumes there will be conditional variables.
CausalityTools.VariationDistanceType
VariationDistance <: DivergenceOrDistance

The variation distance.

Usage

  • Use with association to compute the compute the variation distance between two pre-computed probability distributions, or from raw data using one of the estimators listed below.

Compatible estimators

Description

The variation distance between two probability distributions $P_X = (p_x(\omega_1), \ldots, p_x(\omega_n))$ and $P_Y = (p_y(\omega_1), \ldots, p_y(\omega_m))$, both defined over the same OutcomeSpace $\Omega = \{\omega_1, \ldots, \omega_n \}$, is defined as

\[D_{V}(P_Y(\Omega) || P_Y(\Omega)) = \dfrac{1}{2} \sum_{\omega \in \Omega} | p_x(\omega) - p_y(\omega) |\]

Examples

CausalityTools.Zhu1Type
Zhu1 <: TransferEntropyEstimator
Zhu1(k = 1, w = 0, base = MathConstants.e)

The Zhu1 transfer entropy estimator Zhu2015 for normalized input data (as described in Zhu2015) for both for pairwise and conditional transfer entropy.

Compatible definitions

Usage

Description

This estimator approximates probabilities within hyperrectangles surrounding each point xᵢ ∈ x using using k nearest neighbor searches. However, it also considers the number of neighbors falling on the borders of these hyperrectangles. This estimator is an extension to the entropy estimator in Singh2003.

w is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to 0, meaning that only the point itself is excluded when searching for neighbours).

Description

For a given points in the joint embedding space jᵢ, this estimator first computes the distance dᵢ from jᵢ to its k-th nearest neighbor. Then, for each point mₖ[i] in the k-th marginal space, it counts the number of points within radius dᵢ.

The Shannon transfer entropy is then computed as

\[TE_S(X \to Y) = \psi(k) + \dfrac{1}{N} \sum_{i}^n \left[ \sum_{k=1}^3 \left( \psi(m_k[i] + 1) \right) \right],\]

where the index k references the three marginal subspaces T, TTf and ST for which neighbor searches are performed. Here this estimator has been modified to allow for conditioning too (a simple modification to Lindner2011's equation 5 and 6).

Example

using CausalityTools
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000)
y = rand(rng, 10000) .+ x
z = rand(rng, 10000) .+ y
est = Zhu1(TEShannon(), k = 10)
association(est, x, z, y) # should be near 0 (and can be negative)
CausalityTools._convert_logunitMethod
_convert_logunit(h_a::Real, , to) → h_b

Convert a number h_a computed with logarithms to base a to an entropy h_b computed with logarithms to base b. This can be used to convert the "unit" of e.g. an entropy.

CausalityTools.associationMethod
association(estimator::AssociationMeasureEstimator, x, y, [z, ...]) → r
association(definition::AssociationMeasure, x, y, [z, ...]) → r

Estimate the (conditional) association between input variables x, y, z, … using the given estimator (an AssociationMeasureEstimator) or definition (an AssociationMeasure).

Info

The type of the return value r depends on the measure/estimator. The interpretation of the returned value also depends on the specific measure and estimator used.

Examples

The examples section of the online documentation has numerous using association.

CausalityTools.backwards_eliminate!Method
backwards_eliminate!(alg::OCE, parents::OCESelectedParents, x, i; verbose)

Algorithm 2.2 in Sun et al. (2015). Perform backward elimination for the i-th variable in x, given the previously inferred parents, which were deduced using parameters in alg. Modifies parents in-place.

CausalityTools.codified_marginalsFunction
codified_marginals(o::OutcomeSpace, x::VectorOrStateSpaceSet...)

Encode/discretize each input vector (e.g. timeseries) xᵢ ∈ x according to a procedure determined by o.

For some outcome spaces, the encoding is sequential (i.e. time ordering matters). Any xᵢ ∈ X that are multidimensional (StateSpaceSets) will be encoded column-wise, i.e. each column of xᵢ is treated as a timeseries and is encoded separately.

This is useful for discretizing input data when computing some MultivariateInformationMeasure. This method is used internally by both the JointProbabilities and EntropyDecomposition estimators to handle discretization.

Supported estimators

  • ValueBinning. Bin visitation frequencies are counted in the joint space XY, then marginal visitations are obtained from the joint bin visits. This behaviour is the same for both FixedRectangularBinning and RectangularBinning (which adapts the grid to the data). When using FixedRectangularBinning, the range along the first dimension is used as a template for all other dimensions.
  • OrdinalPatterns. Each timeseries is separately codify-ed by embedding the timeseries, then sequentially encoding the ordinal patterns of the embedding vectors.
  • Dispersion. Each timeseries is separately codify-ed by embedding the timeseries, then sequentially encoding the embedding vectors according to their dispersion pattern (which for each embedding vector is computed relative to all other embedding vectors).
  • CosineSimilarityBinning. Each timeseries is separately codify-ed by embedding the timeseries, the encoding the embedding points in a in a sequential manner according to the cosine similarity of the embedding vectors.
  • UniqueElements. Each timeseries is codify-ed according to its unique values (i.e. each unique element gets assigned a specific integer).

More implementations are possible.

CausalityTools.cpdagMethod
cpdag(alg::PC, skeleton::SimpleDiGraph, separating_sets::Dict{Edge, Vector{Int}}) → dg::SimpleDiGraph

Orient edges in the skeleton graph using the given separating_sets using algorithm 2 in Kalisch2008 and return the directed graph cpdag.

Description

First, a directed graph dg is constructed from skeleton, such that every undirected edge X - Y in skeleton is replaced by the bidirectional edge X ↔ Y in dg. In practices, for each X ↔ Y, we construct two directional edges X → Y and Y → X.

Orientiation rules 0-3 are then applied to dg. We use the rules as stated in Colombo & Maathuis, 2014.

  • Rule 0 (orients v-structures): X ↔ Y ↔ Z becomes X → Y ← Z if Y is not in the separating set S(X, Z).
  • Rule 1 (prevents new v-structures): X → Y ↔ Z becomes X → Y → Z if X and Z are not adjacent.
  • Rule 2 (avoids cycles): X → Y → Z ↔ X becomes X → Y → Z ← X`
  • Rule 3: To avoid creating cycles or new v-structures, X X | | | | | | Y | W becomes Y | W ↘ | ↙ ↘ ↓ ↙ Z Z
CausalityTools.crossmapFunction
crossmap(measure::CrossmapEstimator, t::AbstractVector, s::AbstractVector) → ρ::Real
crossmap(measure::CrossmapEstimator, est, t::AbstractVector, s::AbstractVector) → ρ::Vector
crossmap(measure::CrossmapEstimator, t̄::AbstractVector, S̄::AbstractStateSpaceSet) → ρ

Compute the cross map estimates between between raw time series t and s (and return the real-valued cross-map statistic ρ). If a CrossmapEstimator est is provided, cross mapping is done on random subsamples of the data, where subsampling is dictated by est (a vector of values for ρ is returned).

Alternatively, cross-map between time-aligned time series and source embedding that have been constructed by jointly (pre-embedding) some input data.

This is just a wrapper around predict that simply returns the correspondence measure between the source and the target.

CausalityTools.distance_covarianceMethod
distance_covariance(x, y) → dcov::Real
distance_covariance(x, y, z) → pdcov::Real

Compute the empirical/sample distance covariance (Székely et al., 2007)[Székely2007] between StateSpaceSets x and y. Alternatively, compute the partial distance covariance pdcov.

CausalityTools.eliminate_loop!Method
eliminate_loop!(alg::OCE, parents::OCESelectedParents, xᵢ; verbose = false)

Inner portion of algorithm 2.2 in Sun et al. (2015). This method is called in an external while-loop that handles the variable elimination step in their line 3.

CausalityTools.estimator_with_overridden_parametersMethod
estimator_with_overridden_parameters(definition, lower_level_estimator) → e::typeof(lower_level_estimator)

Given some higher-level definition of an information measure, which is to be estimated using some lower_level_estimator, return a modified version of the estimator in which its parameter have been overriden by any overlapping parameters from the defintiion.

This method is explicitly extended for each possible decomposition.

CausalityTools.fishers_zMethod
fishers_z(p̂)

Compute Fisher's z-transform on the sample partial correlation coefficient (computed as the correlation between variables i and j, given the remaining variables):

\[Z(V_i, V_j | \bf{S}) = \dfrac{1}{2} \log{\left( \dfrac{1 + \hat{p}(V_i, V_j | \bf{S})}{1 - \hat{p}(V_i, V_j | \bf{S})} \right) }\]

CausalityTools.infer_graphFunction
infer_graph(algorithm::GraphAlgorithm, x) → g

Infer graph from input data x using the given algorithm.

Returns g, whose type depends on algorithm.

CausalityTools.library_indicesFunction
library_indices(measure::CCMLike, est::CrossmapEstimator, i::Int,  target, source)

Produce (randomly, if relevant) the i-th subset of indices for a CrossmapEstimator that is being applied to target and source.

CausalityTools.marginalMethod
marginal(p::Probabilities; dims = 1:ndims(p))
marginal(c::Counts; dims = 1:ndims(p))

Given a set of counts c (a contingency table), or a multivariate probability mass function p, return the marginal counts/probabilities along the given dims.

CausalityTools.marginal_indicesMethod
marginal_indices(x)

Returns a column vector v with the same number of elements as there are unique elements in x. v[i] is the indices of elements in x matching v[i].

For example, if the third unique element in x, and the element u₃ = unique(x)[3] appears four times in x, then v[3] is a vector of four integers indicating the position of the elements matching u₃.

CausalityTools.marginal_probs_from_μMethod
marginal_probs_from_μ(seleced_axes, visited_bins, iv::InvariantMeasure, inds_non0measure)

Estimate marginal probabilities from a pre-computed invariant measure, given a set of visited bins, an invariant measure and the indices of the positive-measure bins. The indices in selected_axes determines which marginals are selected.

CausalityTools.max_inputs_varsMethod
max_inputs_vars(m::AssociationMeasure) → nmax::Int

Return the maximum number of variables is that the measure can be computed for.

For example, MIShannon cannot be computed for more than 2 variables.

CausalityTools.max_segmentlengthFunction
max_segmentlength(x::AbstractVector, measure::CrossmapMeasure)

Given an input vector x, which is representative of the size of the other input vectors too, compute the maximum segment/library length that can be used for predictions.

CausalityTools.min_inputs_varsMethod
min_inputs_vars(m::AssociationMeasure) → nmin::Int

Return the minimum number of variables is that the measure can be computed for.

For example, CMIShannon requires 3 input variables.

CausalityTools.optimize_marginals_teFunction
optimize_marginals_te([scheme = OptimiseTraditional()], s, t, [c]) → EmbeddingTE

Optimize marginal embeddings for transfer entropy computation from source time series s to target time series t, conditioned on c if c is given, using the provided optimization scheme.

CausalityTools.optimize_marginals_teMethod
optimize_marginals_te(opt::OptimiseTraditional, s, t, [c]; exclude_source = false) → EmbeddingTE

Optimise the marginals for a transfer entropy analysis from source time series s to target time series t, potentially given a conditional time series c.

If exclude_source == true, then no optimisation is done for the source. This is useful for SurrogateAssociationTest, because most surrogate methods accept univariate time series, and if we embed the source and it becomes multidimensional, then we can't create surrogates. A future optimization is to do column-wise surrogate generation.

CausalityTools.pairwise_testMethod
pairwise_test(p::OCESelectedParents) → pairwise::Bool

If the parent set is empty, return true (a pairwise test should be performed). If the parent set is nonempty, return false (a conditional test should performed).

CausalityTools.predictMethod
predict(measure::CrossmapEstimator, t::AbstractVector, s::AbstractVector) → t̂ₛ, t̄, ρ
predict(measure::CrossmapEstimator, t̄::AbstractVector, S̄::AbstractStateSpaceSet) → t̂ₛ

Perform point-wise cross mappings between source embeddings and target time series according to the algorithm specified by the given cross-map measure (e.g. ConvergentCrossMapping or PairwiseAsymmetricInference).

  • First method: Jointly embeds the target t and source s time series (according to measure) to obtain time-index aligned target timeseries and source embedding (which is now a StateSpaceSet). Then calls predict(measure, t̄, S̄) (the first method), and returns both the predictions t̂ₛ, observations and their correspondence ρ according to measure.
  • Second method: Returns a vector of predictions t̂ₛ (t̂ₛ := "predictions of based on source embedding "), where t̂ₛ[i] is the prediction for t̄[i]. It assumes pre-embedded data which have been correctly time-aligned using a joint embedding, i.e. such that t̄[i] and S̄[i] correspond to the same time index.

Description

For each i ∈ {1, 2, …, N} where N = length(t) == length(s), we make the prediction t̂[i] (an estimate of t[i]) based on a linear combination of D + 1 other points in t, where the selection of points and weights for the linear combination are determined by the D+1 nearest neighbors of the point S̄[i]. The details of point selection and weights depend on measure.

Note: Some CrossmapMeasures may define more general mapping procedures. If so, the algorithm is described in their docstring.

CausalityTools.rank_transformationMethod
rank_transformation(x::AbstractVector)
rank_transformation(x::AbstractStateSpaceSet) → ranks::NTuple{D, Vector}

Rank-transform each variable/column of the length-n D-dimensional StateSpaceSet x and return the rank-transformed variables as a D-tuple of length-n vectors.

Returns the unscaled ranks. Divide by n to get an approximation to the empirical cumulative distribution function (ECDF) x.

Description

Modulo division by n, rank_transformation does roughly the same as naively computing the ECDF as

[count(xᵢ .<= x)  for xᵢ in x] / length(x)

but an order of magnitude faster and with roughly three orders of magnitude less allocations. The increased efficiency of this function relative to naively computing the ECDF is because it uses sorting of the input data to determine ranks, arbitrarily breaking ties according to the sorting algorithm. Rank ties can therefore never occur, and equal values are assigned different but close ranks. To preserve ties, which you might want to do for example when dealing with categorical or integer-valued data, use (the much slower) empcdf.

CausalityTools.s_measureMethod
s_measure(measure::SMeasure, x::VectorOrStateSpaceSet, y::VectorOrStateSpaceSet) → s ∈ [0, 1]

Compute the given measure to quantify the directional dependence between univariate/multivariate time series x and y.

Returns a scalar s where s = 0 indicates independence between x and y, and higher values indicate synchronization between x and y, with complete synchronization for s = 1.0.

Example

using CausalityTools

x, y = rand(1000), rand(1000)

# 4-dimensional embedding for `x`, 5-dimensional embedding for `y`
m = SMeasure(dx = 4, τx = 3, dy = 5, τy = 1)
association(m, x, y)
CausalityTools.select_parentsMethod
select_parents(alg::OCE, x)

The parent selection step of the OCE algorithm, which identifies the parents of each xᵢ ∈ x, assuming that x must be integer-indexable, i.e. x[i] yields the i-th variable.

CausalityTools.skeletonMethod
skeleton(algorithm::PC, x) → (g, s)

Infer the skeleton graph for the variables x using the provided algorithm. x must be some iterable where the i-th variable can be accessed as x[i].

Return a tuple of the undirected skeleton graph g::SimpleGraph, and the separating sets s::Dict{SimpleEdge, Vector{Int}}).

CausalityTools.skeleton_conditional!Method
skeleton_conditional!(alg::PC, graph, x, conditional_test::IndependenceTest)

Thin the skeleton graph, where each vertex is represented by the data x[i], by using conditional_test. Whenever x[i] ⫫ x[j] | x[S] for some set of variables not including i and j, the edge between i and j is removed, and S is stored in the separating set for i and j. This is essentially algorithm 3.2 in Colombo2014, for the cases 𝓁 >= 1.

Modifies graph in-place.

CausalityTools.skeleton_unconditional!Method
skeleton_unconditional!(alg::PC, g::SimpleGraph, x)

Perform pairwise independence tests between all vertices in the graph, where each vertex is represented by the data x[i], and remove any edges in graph where adjacent vertices are found to be independent according to the given independence test. The null hypothesis of independence is rejected whenever the p-value is below α.

This is essentially algorithm 3.2 in Colombo2014, but only considering the case 𝓁 = 0.

Modifies graph in-place.

If alg.pairwise_test is a directed test, then edges are considered one-by-one. If alg.pairwise_test is not a directed test, then edges (X → Y,Y → X`) are considered simultaneously.

CausalityTools.te_embedMethod
te_embed(source::VectorOr1DDataset, target::VectorOr1DDataset, p::EmbeddingTE) → (points, vars, τs)
te_embed(source::VectorOr1DDataset, target::VectorOr1DDataset, cond::VectorOr1DDataset, p::EmbeddingTE) → (points, vars, τs)

Generalised delay reconstruction of source and target (and cond if provided) for transfer entropy computation using embedding parameters provided by the EmbeddingTE instance p.

Returns a tuple of the embedded points, vars (a TEVars instance that keeps track of which variables of the embedding belong to which marginals of the reconstruction; indices are: source = 1, target = 2, cond = 3), and a tuple τs, which stores the lags for each variable of the reconstruction.

CausalityTools.tolevels!Method
tolevels!(levels, x) → levels, dict
tolevels(x) → levels, dict

Apply the bijective map $f : \mathcal{Q} \to \mathbb{N}^+$ to each x[i] and store the result in levels[i], where levels is a pre-allocated integer vector such that length(x) == length(levels).

$\mathcal{Q}$ can be any space, and each $q \in \mathcal{Q}$ is mapped to a unique integer in the range 1, 2, …, length(unique(x)). This is useful for integer-encoding categorical data such as strings, or other complex discrete data structures.

The single-argument method allocated a levels vector internally.

dict gives the inverse mapping.

CausalityTools.verify_decomposition_entropy_typeMethod
verify_decomposition_entropy_type(
    definition::MultivariateInformationMeasure, 
    est::Union{DiscreteInfoEstimator, DifferentialInfoEstimator}
)

Check that we can actually decompose the definition into est.definition. The default is to do nothing. Certain definitions may override (e.g. CMIRenyiJizba does so).

ComplexityMeasures.codifyMethod
codify(encoding::CodifyPoints{N}, x::Vararg{<:AbstractStateSpaceSet, N})

Codify each timeseries xᵢ ∈ x according to the given encoding.

Examples

x = StateSpaceSet(rand(10000, 2))
y = StateSpaceSet(rand(10000, 3))
z = StateSpaceSet(rand(10000, 2))

# For `x`, we use a relative mean encoding.
ex = RelativeMeanEncoding(0.0, 1.0, n = 3)
# For `y`, we use a combination encoding.
ey = CombinationEncoding(
    RelativeMeanEncoding(0.0, 1.0, n = 2), 
    OrdinalPatternEncoding(3)
)
# For `z`, we use ordinal patterns to encode.
ez = OrdinalPatternEncoding(2)

# Codify two input datasets gives a 2-tuple of Vector{Int}
codify(CodifyPoints(ex, ey), x, y)

# Codify three input datasets gives a 3-tuple of Vector{Int}
codify(CodifyPoints(ex, ey, ez), x, y, z)
ComplexityMeasures.codifyMethod
codify(d::CodifyVariables, x::Vararg{<:AbstractStateSpaceSet, N})
codify(d::CodifyPoints, x::Vararg{<:AbstractStateSpaceSet, N})

Codify each timeseries xᵢ ∈ x according to the given encoding/discretization d.

Compatible discretizations

Examples

using CausalityTools

# Sliding window encoding
x = [0.1, 0.2, 0.3, 0.2, 0.1, 0.0, 0.5, 0.3, 0.5]
xc1 = codify(CodifyVariables(OrdinalPatterns(m=2)), x) # should give [1, 1, 2, 2, 2, 1, 2, 1]
xc2 = codify(OrdinalPatterns(m=2), x) # equivalent
length(xc1) < length(x) # should be true, because `OrdinalPatterns` delay embeds.  

# Point-by-point encoding
x, y = StateSpaceSet(rand(100, 3)), StateSpaceSet(rand(100, 3))
cx, cy = codify(CodifyPoints(OrdinalPatternEncoding(3)), x, y)
ComplexityMeasures.countsMethod
counts(o::UniqueElements, x₁, x₂, ..., xₙ) → Counts{N}
counts(encoding::CodifyPoints, x₁, x₂, ..., xₙ) → Counts{N}
counts(encoding::CodifyVariables, x₁, x₂, ..., xₙ) → Counts{N}

Construct an N-dimensional contingency table from the input iterables x₁, x₂, ..., xₙ which are such that length(x₁) == length(x₂) == ⋯ == length(xₙ).

If x₁, x₂, ..., xₙ are already discrete, then use UniqueElements as the first argument to directly construct the joint contingency table.

If x₁, x₂, ..., xₙ need to be discretized, provide as the first argument

  • CodifyPoints (encodes every point in each of the input variables xᵢs individually)
  • CodifyVariables (encodes every xᵢ individually using a sliding window encoding). NB: If using different OutcomeSpaces for the different xᵢ, then total_outcomes must be the same for every outcome space.

Examples

# Discretizing some non-discrete data using a sliding-window encoding for each variable
x, y = rand(100), rand(100)
c = CodifyVariables(OrdinalPatterns(m = 4))
counts(c, x, y)

# Discretizing the data by binning each individual data point
binning = RectangularBinning(3)
encoding = RectangularBinEncoding(binning, [x; y]) # give input values to ensure binning covers all data
c = CodifyPoints(encoding)
counts(c, x, y)

# Counts table for already discrete data
n = 50 # all variables must have the same number of elements
x = rand(["dog", "cat", "mouse"], n)
y = rand(1:3, n)
z = rand([(1, 2), (2, 1)], n)

counts(UniqueElements(), x, y, z)

See also: CodifyPoints, CodifyVariables, UniqueElements, OutcomeSpace, probabilities.

ComplexityMeasures.probabilitiesMethod
probabilities(o::UniqueElements, x₁, x₂, ..., xₙ) → Counts{N}
probabilities(encoding::CodifyPoints, x₁, x₂, ..., xₙ) → Counts{N}
probabilities(encoding::CodifyVariables, x₁, x₂, ..., xₙ) → Counts{N}

Construct an N-dimensional Probabilities array from the input iterables x₁, x₂, ..., xₙ which are such that length(x₁) == length(x₂) == ⋯ == length(xₙ).

Description

Probabilities are computed by first constructing a joint contingency matrix in the form of a Counts instance.

If x₁, x₂, ..., xₙ are already discrete, then use UniqueElements as the first argument to directly construct the joint contingency table.

If x₁, x₂, ..., xₙ need to be discretized, provide as the first argument

  • CodifyPoints (encodes every point in each of the input variables xᵢs individually)
  • CodifyVariables (encodes every xᵢ individually using a sliding window encoding).

Examples

# Discretizing some non-discrete data using a sliding-window encoding for each variable
x, y = rand(100), rand(100)
c = CodifyVariables(OrdinalPatterns(m = 4))
probabilities(c, x, y)

# Discretizing the data by binning each individual data point
binning = RectangularBinning(3)
encoding = RectangularBinEncoding(binning, [x; y]) # give input values to ensure binning covers all data
c = CodifyPoints(encoding)
probabilities(c, x, y)

# Joint probabilities for already discretized data
n = 50 # all variables must have the same number of elements
x = rand(["dog", "cat", "mouse"], n)
y = rand(1:3, n)
z = rand([(1, 2), (2, 1)], n)

probabilities(UniqueElements(), x, y, z)

See also: CodifyPoints, CodifyVariables, UniqueElements, OutcomeSpace.

DelayEmbeddings.embedMethod
embed(measure::CrossmapMeasure, target::AbstractVector,
    sources::AbstractVector...) → emb::StateSpaceSet{D}

Jointly embed the input vector target, together with one or more vectors s ∈ sources, according to the given CrossmapMeasure.

This produces emb, a D-dimensional StateSpaceSet where

  • The last column is always the non-lagged target variable. Typically, this is the variable we're trying to predict.
  • The D-1 first columns are the (non)lagged versions of each source time series s ∈ sources. Typically, emb[:, 1:D-1] is the subspace in which neighborhood searches are done, which forms the basis of cross-map predictions.
StatsAPI.pvalueMethod
pvalue(test::CorrTest, z, c::Int, n::Int)

Compute the two-sided p-value for the test of the partial correlation coefficient being zero, where c is the cardinality of the conditioning set and n is the number of samples.

  • Székely2007Székely, G. J., Rizzo, M. L., & Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. The annals of statistics, 35(6), 2769-2794.
  • Székely2007Székely, G. J., Rizzo, M. L., & Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. The annals of statistics, 35(6), 2769-2794.