ComplexityMeasures.ComplexityMeasuresModule

ComplexityMeasures.jl

docsdev docsstable CI codecov Package Downloads Package Downloads publication

ComplexityMeasures.jl is a Julia-based software for calculating 1000s of various kinds of probabilities, entropies, and other so-called complexity measures from a single-variable input datasets. For relational measures across many input datasets see its extension CausalityTools.jl. If you are a user of other programming languages (Python, R, MATLAB, ...), you can still use ComplexityMeasures.jl due to Julia's interoperability. For example, for Python use juliacall.

A careful comparison with alternative widely used software shows that ComplexityMeasures.jl outclasses the alternatives in several objective aspects of comparison, such as computational performance, overall amount of measures, reliability, and extendability. See the associated publication for more details.

The key features that it provides can be summarized as:

  • A rigorous framework for extracting probabilities from data, based on the mathematical formulation of probability spaces.
  • Several (12+) outcome spaces, i.e., ways to discretize data into probabilities.
  • Several estimators for estimating probabilities given an outcome space, which correct theoretically known estimation biases.
  • Several definitions of information measures, such as various flavours of entropies (Shannon, Tsallis, Curado...), extropies, and other complexity measures, that are used in the context of nonlinear dynamics, nonlinear timeseries analysis, and complex systems.
  • Several discrete and continuous (differential) estimators for entropies, which correct theoretically known estimation biases.
  • An extendable interface and well thought out API accompanied by dedicated developer documentation. This makes it trivial to define new outcome spaces, or new estimators for probabilities, information measures, or complexity measures and integrate them with everything else in the software without boilerplate code.

ComplexityMeasures.jl can be used as a standalone package, or as part of other projects in the JuliaDynamics organization, such as DynamicalSystems.jl or CausalityTools.jl.

To install it, run import Pkg; Pkg.add("ComplexityMeasures").

All further information is provided in the documentation, which you can either find online or build locally by running the docs/make.jl file.

Previously, this package was called Entropies.jl.

ComplexityMeasures.AddConstantType
AddConstant <: ProbabilitiesEstimator
AddConstant(; c = 1.0)

A generic add-constant probabilities estimator for counting-based OutcomeSpaces, where several literature estimators can be obtained tuning c. Currently $c$ can only be a scalar.

  • c = 1.0 is the Laplace estimator, or the "add-one" estimator.

Description

Probabilities for the $k$-th outcome $\omega_{k}$ are estimated as

\[p(\omega_k) = \dfrac{(n_k + c)}{n + mc},\]

where $m$ is the cardinality of the outcome space, and $n$ is the number of (encoded) input data points, and $n_k$ is the number of times the outcome $\omega_{k}$ is observed in the (encoded) input data points.

If the AddConstant estimator used with probabilities_and_outcomes, then $m$ is set to the number of observed outcomes. If used with allprobabilities_and_outcomes, then $m$ is set to the number of possible outcomes.

Unobserved outcomes are assigned nonzero probability!

Looking at the formula above, if $n_k = 0$, then unobserved outcomes are assigned a non-zero probability of $\dfrac{c}{n + mc}$. This means that if the estimator is used with allprobabilities_and_outcomes, then all outcomes, even those that are not observed, are assigned non-zero probabilities. This might affect your results if using e.g. missing_outcomes.

ComplexityMeasures.AlizadehArghamiType
AlizadehArghami <: DifferentialInfoEstimator
AlizadehArghami(definition = Shannon(); m::Int = 1)

The AlizadehArghami estimator computes the Shannon differential information of a timeseries using the method from Alizadeh2010, with logarithms to the base specified in definition.

The AlizadehArghami estimator belongs to a class of differential entropy estimators based on order statistics. It only works for timeseries input.

Description

Assume we have samples $\bar{X} = \{x_1, x_2, \ldots, x_N \}$ from a continuous random variable $X \in \mathbb{R}$ with support $\mathcal{X}$ and density function$f : \mathbb{R} \to \mathbb{R}$. AlizadehArghami estimates the Shannon differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]

However, instead of estimating the above integral directly, it makes use of the equivalent integral, where $F$ is the distribution function for $X$:

\[H(X) = \int_0^1 \log \left(\dfrac{d}{dp}F^{-1}(p) \right) dp.\]

This integral is approximated by first computing the order statistics of $\bar{X}$ (the input timeseries), i.e. $x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)}$. The AlizadehArghami Shannon differential entropy estimate is then the the Vasicek estimate $\hat{H}_{V}(\bar{X}, m, n)$, plus a correction factor

\[\hat{H}_{A}(\bar{X}, m, n) = \hat{H}_{V}(\bar{X}, m, n) + \dfrac{2}{n}\left(m \log(2) \right).\]

See also: information, Correa, Ebrahimi, Vasicek, DifferentialInfoEstimator.

ComplexityMeasures.AmplitudeAwareOrdinalPatternsType
AmplitudeAwareOrdinalPatterns <: OutcomeSpace
AmplitudeAwareOrdinalPatterns{m}(τ = 1, A = 0.5, lt = ComplexityMeasures.isless_rand)

A variant of OrdinalPatterns that also incorporates amplitude information, based on the amplitude-aware permutation entropy Azami2016. The outcome space and arguments are the same as in OrdinalPatterns.

Description

Similarly to WeightedOrdinalPatterns, a weight $w_i$ is attached to each ordinal pattern extracted from each state (or delay) vector $\mathbf{x}_i = (x_1^i, x_2^i, \ldots, x_m^i)$ as

\[w_i = \dfrac{A}{m} \sum_{k=1}^m |x_k^i | + \dfrac{1-A}{d-1} \sum_{k=2}^d |x_{k}^i - x_{k-1}^i|,\]

with $0 \leq A \leq 1$. When $A=0$ , only internal differences between the elements of $\mathbf{x}_i$ are weighted. Only mean amplitude of the state vector elements are weighted when $A=1$. With, $0<A<1$, a combined weighting is used.

ComplexityMeasures.ApproximateEntropyType
ApproximateEntropy <: ComplexityEstimator
ApproximateEntropy([x]; r = 0.2std(x), kwargs...)

An estimator for the approximate entropy Pincus1991 complexity measure, used with complexity.

The keyword argument r is mandatory if an input timeseries x is not provided.

Keyword arguments

  • r::Real: The radius used when querying for nearest neighbors around points. Its value should be determined from the input data, for example as some proportion of the standard deviation of the data.
  • m::Int = 2: The embedding dimension.
  • τ::Int = 1: The embedding lag.
  • base::Real = MathConstants.e: The base to use for the logarithm. Pincus (1991) uses the natural logarithm.

Description

Approximate entropy (ApEn) is defined as

\[ApEn(m ,r) = \lim_{N \to \infty} \left[ \phi(x, m, r) - \phi(x, m + 1, r) \right].\]

Approximate entropy is estimated for a timeseries x, by first embedding x using embedding dimension m and embedding lag τ, then searching for similar vectors within tolerance radius r, using the estimator described below, with logarithms to the given base (natural logarithm is used in Pincus, 1991).

Specifically, for a finite-length timeseries x, an estimator for $ApEn(m ,r)$ is

\[ApEn(m, r, N) = \phi(x, m, r, N) - \phi(x, m + 1, r, N),\]

where N = length(x) and

\[\phi(x, k, r, N) = \dfrac{1}{N-(k-1)\tau} \sum_{i=1}^{N - (k-1)\tau} \log{\left( \sum_{j = 1}^{N-(k-1)\tau} \dfrac{\theta(d({\bf x}_i^m, {\bf x}_j^m) \leq r)}{N-(k-1)\tau} \right)}.\]

Here, $\theta(\cdot)$ returns 1 if the argument is true and 0 otherwise, $d({\bf x}_i, {\bf x}_j)$ returns the Chebyshev distance between vectors ${\bf x}_i$ and ${\bf x}_j$, and the k-dimensional embedding vectors are constructed from the input timeseries $x(t)$ as

\[{\bf x}_i^k = (x(i), x(i+τ), x(i+2τ), \ldots, x(i+(k-1)\tau)).\]

Flexible embedding lag

In the original paper, they fix τ = 1. In our implementation, the normalization constant is modified to account for embeddings with τ != 1.

ComplexityMeasures.BayesianRegularizationType
BayesianRegularization <: ProbabilitiesEstimator
BayesianRegularization(; a = 1.0)

The BayesianRegularization estimator is used with probabilities and related functions to estimate probabilities an m-element counting-based OutcomeSpace using Bayesian regularization of cell counts Hausser2009. See ProbabilitiesEstimator for usage.

Outcome space requirements

This estimator only works with counting-compatible outcome spaces.

Description

The BayesianRegularization estimator estimates the probability of the $k$-th outcome $\omega_{k}$ is

\[\omega_{k}^{\text{BayesianRegularization}} = \dfrac{n_k + a_k}{n + A},\]

where $n$ is the number of samples in the input data, $n_k$ is the observed counts for the outcome $\omega_{k}$, and $A = \sum_{i=1}^k a_k$.

Picking a

There are many common choices of priors, some of which are listed in Hausser2009. They include

  • a == 0, which is equivalent to the RelativeAmount estimator.
  • a == 0.5 (Jeffrey's prior)
  • a == 1 (Bayes-Laplace uniform prior)

a can also be chosen as a vector of real numbers. Then, if used with allprobabilities_and_outcomes, it is required that length(a) == total_outcomes(o, x), where x is the input data and o is the OutcomeSpace. If used with probabilities, then length(a) must match the number of observed outcomes (you can check this using probabilities_and_outcomes). The choice of a can severely impact the estimation errors of the probabilities, and the errors depend both on the choice of a and on the sampling scenario Hausser2009.

Assumptions

The BayesianRegularization estimator assumes a fixed and known m. Thus, using it with probabilities_and_outcomes and allprobabilities_and_outcomes will yield different results, depending on whether all outcomes are observed in the input data or not. For probabilities_and_outcomes, m is the number of observed outcomes. For allprobabilities_and_outcomes, m = total_outcomes(o, x), where o is the OutcomeSpace and x is the input data.

Note

If used with allprobabilities_and_outcomes, then outcomes which have not been observed may be assigned non-zero probabilities. This might affect your results if using e.g. missing_outcomes.

Examples

using ComplexityMeasures
x = cumsum(randn(100))
ps_bayes = probabilities(BayesianRegularization(a = 0.5), OrdinalPatterns{3}(), x)

See also: RelativeAmount, Shrinkage.

ComplexityMeasures.BubbleEntropyType
BubbleEntropy <: ComplexityEstimator
BubbleEntropy(; m = 3, τ = 1, definition = Renyi(q = 2))

The BubbleEntropy complexity estimator Manis2017 is just a difference between two entropies, each computed with the BubbleSortSwaps outcome space, for embedding dimensions m + 1 and m, respectively.

Manis2017 use the Renyi entropy of order q = 2 as the information measure definition, but here you can use any InformationMeasure. Manis2017 formulates the "bubble entropy" as the normalized measure below, while here you can also compute the unnormalized measure.

Definition

For input data x, the "bubble entropy" is computed by first embedding the input data using embedding dimension m and embedding delay τ (call the embedded pts y), and then computing the difference between the two entropies:

\[BubbleEn_T(τ) = H_T(y, m + 1) - H_T(y, m)\]

where $H_T(y, m)$ and $H_T(y, m + 1)$ are entropies of type $T$ (e.g. Renyi) computed with the input data x embedded to dimension $m$ and $m+1$, respectively. Use complexity to compute this non-normalized version. Use complexity_normalized to compute the normalized difference of entropies:

\[BubbleEn_H(τ)^{norm} = \dfrac{H_T(x, m + 1) - H_T(x, m)}{max(H_T(x, m + 1)) - max(H_T(x, m))},\]

where the maximum of the entropies for dimensions m and m + 1 are computed using information_maximum.

Example

using ComplexityMeasures
x = rand(1000)
est = BubbleEntropy(m = 5, τ = 3)
complexity(est, x)
ComplexityMeasures.BubbleSortSwapsType
BubbleSortSwaps <: CountBasedOutcomeSpace
BubbleSortSwaps(; m = 3, τ = 1)

The BubbleSortSwaps outcome space is based on Manis2017's paper on "bubble entropy".

Description

BubbleSortSwaps does the following:

  • Embeds the input data using embedding dimension m and embedding lag τ
  • For each state vector in the embedding, counting how many swaps are necessary for the bubble sort algorithm to sort state vectors.

For counts_and_outcomes, we then define a distribution over the number of necessary swaps. This distribution can then be used to estimate probabilities using probabilities_and_outcomes, which again can be used to estimate any InformationMeasure. An example of how to compute the "Shannon bubble entropy" is given below.

Outcome space

The outcome_space for BubbleSortSwaps are the integers 0:N, where N = (m * (m - 1)) / 2 + 1 (the worst-case number of swaps). Hence, the number of total_outcomes is N + 1.

Implements

  • codify. Returns the number of swaps required for each embedded state vector.

Examples

With the BubbleSortSwaps outcome space, we can easily compute a "bubble entropy" inspired by Manis2017. Note: this is not actually a new entropy - it is just a new way of discretizing the input data. To reproduce the bubble entropy complexity measure from Manis2017, see BubbleEntropy.

Examples

using ComplexityMeasures
x = rand(100000)
o = BubbleSortSwaps(; m = 5) # 5-dimensional embedding vectors
information(Shannon(; base = 2), o, x)

# We can also compute any other "bubble quantity", for example the 
# "Tsallis bubble extropy", with arbitrary probabilities estimators:
information(TsallisExtropy(), BayesianRegularization(), o, x)
ComplexityMeasures.BubbleSortSwapsEncodingType
BubbleSortSwapsEncoding <: Encoding
BubbleSortSwapsEncoding{m}()

BubbleSortSwapsEncoding is used with encode to encode a length-m input vector x into an integer in the range ω ∈ 0:((m*(m-1)) ÷ 2), by counting the number of swaps required for the bubble sort algorithm to sort x in ascending order.

decode is not implemented for this encoding.

Example

using ComplexityMeasures
x = [1, 5, 3, 1, 2]
e = BubbleSortSwapsEncoding{5}() # constructor type argument must match length of vector 
encode(e, x)
ComplexityMeasures.ChaoShenType
ChaoShen <: DiscreteInfoEstimatorShannon
ChaoShen(definition::Shannon = Shannon())

The ChaoShen estimator is used with information to compute the discrete Shannon entropy according to Chao2003.

Description

This estimator is a modification of the HorvitzThompson estimator that multiplies each plugin probability estimate by an estimate of sample coverage. If $f_1$ is the number of singletons (outcomes that occur only once) in a sample of length $N$, then the sample coverage is $C = 1 - \dfrac{f_1}{N}$. The Chao-Shen estimator of Shannon entropy is then

\[H_S^{CS} = -\sum_{i=1}^M \left( \dfrac{C p_i \log(C p_i)}{1 - (1 - C p_i)^N} \right),\]

where $N$ is the sample size and $M$ is the number of outcomes. If $f_1 = N$, then $f_1$ is set to $f_1 = N - 1$ to ensure positive entropy Arora2022.

ComplexityMeasures.CombinationEncodingType
CombinationEncoding <: Encoding
CombinationEncoding(encodings)

A CombinationEncoding takes multiple Encodings and creates a combined encoding that can be used to encode inputs that are compatible with the given encodings.

Encoding/decoding

When used with encode, each Encoding in encodings returns integers in the set 1, 2, …, n_e, where n_e is the total number of outcomes for a particular encoding. For k different encodings, we can thus construct the cartesian coordinate (c₁, c₂, …, cₖ) (cᵢ ∈ 1, 2, …, n_i), which can uniquely be identified by an integer. We can thus identify each unique combined encoding with a single integer.

When used with decode, the integer symbol is converted to its corresponding cartesian coordinate, which is used to retrieve the decoded symbols for each of the encodings, and a tuple of the decoded symbols are returned.

The total number of outcomes is prod(total_outcomes(e) for e in encodings).

Examples

using ComplexityMeasures

# We want to encode the vector `x`.
x = [0.9, 0.2, 0.3]

# To do so, we will use a combination of first-difference encoding, amplitude encoding,
# and ordinal pattern encoding.

encodings = (
    RelativeFirstDifferenceEncoding(0, 1; n = 2),
    RelativeMeanEncoding(0, 1; n = 5),
    OrdinalPatternEncoding(3) # x is a three-element vector
    )
c = CombinationEncoding(encodings)

# Encode `x` as integer
ω = encode(c, x)

# Decode symbol (into a vector of decodings, one for each encodings `e ∈ encodings`).
# In this particular case, the first two element will be left-bin edges, and
# the last element will be the decoded ordinal pattern (indices that would sort `x`).
d = decode(c, ω)
ComplexityMeasures.CompositeDownsamplingType
CompositeDownsampling <: MultiScaleAlgorithm
CompositeDownsampling(; f::Function = Statistics.mean, scales = 1:8)

Composite multi-scale algorithm for multiscale entropy analysis Wu2013, used with multiscale to compute, for example, composite multiscale entropy (CMSE).

Description

Given a scalar-valued input time series x, the composite multiscale algorithm, like RegularDownsampling, downsamples and coarse-grains x by splitting it into non-overlapping windows of length s, and then constructing downsampled time series by applying the function f to each of the resulting length-s windows.

However, Wu2013 realized that for each scale s, there are actually s different ways of selecting windows, depending on where indexing starts/ends. These s different downsampled time series D_t(s, f) at each scale s are constructed as follows:

\[\{ D_{k}(s) \} = \{ D_{t, k}(s) \}_{t = 1}^{L}, = \{ f \left( \bf x_{t, k} \right) \} = \left\{ {f\left( (x_i)_{i = (t - 1)s + k}^{ts + k - 1} \right)} \right\}_{t = 1}^{L},\]

where L = floor((N - s + 1) / s) and 1 ≤ k ≤ s, such that $D_{i, k}(s)$ is the i-th element of the k-th downsampled time series at scale s.

Finally, compute $\dfrac{1}{s} \sum_{k = 1}^s g(D_{k}(s))$, where g is some summary function, for example information or complexity.

Keyword Arguments

  • scales. The downsampling levels. If scales is set to an integer, then this integer is taken as maximum number of scales (i.e. levels of downsampling), and downsampling is done over levels 1:scales. Otherwise, downsampling is done over the provided scales (which may be a range, or some specific scales (e.g. scales = [1, 5, 6]). The maximum scale level is length(x) ÷ 2, but to avoid applying the method to time series that are extremely short, consider limiting the maximum scale (e.g. scales = length(x) ÷ 5).
Relation to RegularDownsampling

The downsampled time series $D_{t, 1}(s)$ constructed using the composite multiscale method is equivalent to the downsampled time series $D_{t}(s)$ constructed using the RegularDownsampling method, for which k == 1 is fixed, such that only a single time series is returned.

See also: RegularDownsampling.

ComplexityMeasures.CorreaType
Correa <: DifferentialInfoEstimator
Correa(definition = Shannon(); m::Int = 1)

The Correa estimator computes the Shannon differential information of a timeseries using the method from Correa1995, with logarithms to the base specified in definition.

The Correa estimator belongs to a class of differential entropy estimators based on order statistics. It only works for timeseries input.

Description

Assume we have samples $\bar{X} = \{x_1, x_2, \ldots, x_N \}$ from a continuous random variable $X \in \mathbb{R}$ with support $\mathcal{X}$ and density function$f : \mathbb{R} \to \mathbb{R}$. Correa estimates the Shannon differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]

However, instead of estimating the above integral directly, Correa makes use of the equivalent integral, where $F$ is the distribution function for $X$,

\[H(X) = \int_0^1 \log \left(\dfrac{d}{dp}F^{-1}(p) \right) dp\]

This integral is approximated by first computing the order statistics of $\bar{X}$ (the input timeseries), i.e. $x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)}$, ensuring that end points are included. The Correa estimate of Shannon differential entropy is then

\[H_C(\bar{X}, m, n) = \dfrac{1}{n} \sum_{i = 1}^n \log \left[ \dfrac{ \sum_{j=i-m}^{i+m}(\bar{X}_{(j)} - \tilde{X}_{(i)})(j - i)}{n \sum_{j=i-m}^{i+m} (\bar{X}_{(j)} - \tilde{X}_{(i)})^2} \right],\]

where

\[\tilde{X}_{(i)} = \dfrac{1}{2m + 1} \sum_{j = i - m}^{i + m} X_{(j)}.\]

See also: information, AlizadehArghami, Ebrahimi, Vasicek, DifferentialInfoEstimator.

ComplexityMeasures.CosineSimilarityBinningType
CosineSimilarityBinning(; m::Int, τ::Int, nbins::Int)

A OutcomeSpace based on the cosine similarity Wang2020.

It can be used with information to compute the "diversity entropy" of an input timeseries Wang2020.

The implementation here allows for τ != 1, which was not considered in the original paper.

Description

CosineSimilarityBinning probabilities are computed as follows.

  1. From the input time series x, using embedding lag τ and embedding dimension m, construct the embedding $Y = \{\bf x_i \} = \{(x_{i}, x_{i+\tau}, x_{i+2\tau}, \ldots, x_{i+m\tau - 1}\}_{i = 1}^{N-mτ}$.
  2. Compute $D = \{d(\bf x_t, \bf x_{t+1}) \}_{t=1}^{N-mτ-1}$, where $d(\cdot, \cdot)$ is the cosine similarity between two m-dimensional vectors in the embedding.
  3. Divide the interval [-1, 1] into nbins equally sized subintervals (including the value +1).
  4. Construct a histogram of cosine similarities $d \in D$ over those subintervals.
  5. Sum-normalize the histogram to obtain probabilities.

Outcome space

The outcome space for CosineSimilarityBinning is the bins of the [-1, 1] interval, and the return configuration is the same as in ValueBinning (left bin edge).

Implements

  • codify. Used for encoding inputs where ordering matters (e.g. time series).
ComplexityMeasures.CountsType
Counts <: Array{<:Integer, N}
Counts(counts [, outcomes [, dimlabels]]) → c

Counts stores an N-dimensional array of integer counts corresponding to a set of outcomes. This is typically called a "frequency table" or "contingency table".

If c isa Counts, then c.outcomes[i] is an abstract vector containing the outcomes along the i-th dimension, where c[i][j] is the count corresponding to the outcome c.outcomes[i][j], and c.dimlabels[i] is the label of the i-th dimension. Both labels and outcomes are assigned automatically if not given. c itself can be manipulated and iterated over like its stored array.

ComplexityMeasures.CuradoType
Curado <: InformationMeasure
Curado(; b = 1.0)

The Curado entropy Curado2004, used with information to compute

\[H_C(p) = \left( \sum_{i=1}^N e^{-b p_i} \right) + e^{-b} - 1,\]

with b ∈ ℛ, b > 0, and the terms outside the sum ensures that $H_C(0) = H_C(1) = 0$.

The maximum entropy for Curado is $L(1 - \exp(-b/L)) + \exp(-b) - 1$ with $L$ the total_outcomes.

ComplexityMeasures.DifferentialInfoEstimatorType
DifferentialInfoEstimator

The supertype of all differential information measure estimators. These estimators compute an information measure in various ways that do not involve explicitly estimating a probability distribution.

Each DifferentialInfoEstimators uses a specialized technique to approximate relevant densities/integrals, and is often tailored to one or a few types of information measures. For example, Kraskov estimates the Shannon entropy.

See information for usage.

Implementations

ComplexityMeasures.DiscreteInfoEstimatorType
DiscreteInfoEstimator

The supertype of all discrete information measure estimators, which are used in combination with a ProbabilitiesEstimator as input to information or related functions.

The first argument to a discrete estimator is always an InformationMeasure (defaults to Shannon).

Description

A discrete InformationMeasure is a functional of a probability mass function. To estimate such a measure from data, we must first estimate a probability mass function using a ProbabilitiesEstimator from the (encoded/discretized) input data, and then apply the estimator to the estimated probabilities. For example, the Shannon entropy is typically computed using the RelativeAmount estimator to compute probabilities, which are then given to the PlugIn estimator. Many other estimators exist, not only for Shannon entropy, but other information measures as well.

We provide a library of both generic estimators such as PlugIn or Jackknife (which can be applied to any measure), as well as dedicated estimators such as MillerMadow, which computes Shannon entropy using the Miller-Madow bias correction. The list below gives a complete overview.

Implementations

The following estimators are generic and can compute any InformationMeasure.

  • PlugIn. The default, generic plug-in estimator of any information measure. It computes the measure exactly as stated in the definition, using the computed probability mass function.
  • Jackknife. Uses the a combination of the plug-in estimator and the jackknife principle to estimate the information measure.

Shannon entropy estimators

The following estimators are dedicated Shannon entropy estimators, which provide improvements over the naive PlugIn estimator.

Info

Any of the implemented DiscreteInfoEstimators can be used in combination with any ProbabilitiesEstimator as input to information. What this means is that every estimator actually comes in many different variants - one for each ProbabilitiesEstimator. For example, the MillerMadow estimator of Shannon entropy is typically calculated with RelativeAmount probabilities. But here, you can use for example the BayesianRegularization or the Shrinkage probabilities estimators instead, i.e. information(MillerMadow(), RelativeAmount(outcome_space), x) and information(MillerMadow(), BayesianRegularization(outcomes_space), x) are distinct estimators. This holds for all DiscreteInfoEstimators. Many of these estimators haven't been explored in the literature before, so feel free to explore, and please cite this software if you use it to explore some new estimator combination!

ComplexityMeasures.DispersionType
Dispersion(; c = 5, m = 2, τ = 1, check_unique = true)

An OutcomeSpace based on dispersion patterns, originally used by Rostaghi2016 to compute the "dispersion entropy", which characterizes the complexity and irregularity of a time series.

Recommended parameter values Li2018 are m ∈ [2, 3], τ = 1 for the embedding, and c ∈ [3, 4, …, 8] categories for the Gaussian symbol mapping.

Description

Assume we have a univariate time series $X = \{x_i\}_{i=1}^N$. First, this time series is encoded into a symbol timeseries $S$ using the Gaussian encoding GaussianCDFEncoding with empirical mean μ and empirical standard deviation σ (both determined from $X$), and c as given to Dispersion.

Then, $S$ is embedded into an $m$-dimensional time series, using an embedding lag of $\tau$, which yields a total of $N - (m - 1)\tau$ delay vectors $z_i$, or "dispersion patterns". Since each element of $z_i$ can take on c different values, and each delay vector has m entries, there are c^m possible dispersion patterns. This number is used for normalization when computing dispersion entropy.

The returned probabilities are simply the frequencies of the unique dispersion patterns present in $S$ (i.e., the UniqueElements of $S$).

Outcome space

The outcome space for Dispersion is the unique delay vectors whose elements are the the symbols (integers) encoded by the Gaussian CDF, i.e., the unique elements of $S$.

Data requirements and parameters

The input must have more than one unique element for the Gaussian mapping to be well-defined. Li2018 recommends that x has at least 1000 data points.

If check_unique == true (default), then it is checked that the input has more than one unique value. If check_unique == false and the input only has one unique element, then a InexactError is thrown when trying to compute probabilities.

Why 'dispersion patterns'?

Each embedding vector is called a "dispersion pattern". Why? Let's consider the case when $m = 5$ and $c = 3$, and use some very imprecise terminology for illustration:

When $c = 3$, values clustering far below mean are in one group, values clustered around the mean are in one group, and values clustering far above the mean are in a third group. Then the embedding vector $[2, 2, 2, 2, 2]$ consists of values that are close together (close to the mean), so it represents a set of numbers that are not very spread out (less dispersed). The embedding vector $[1, 1, 2, 3, 3]$, however, represents numbers that are much more spread out (more dispersed), because the categories representing "outliers" both above and below the mean are represented, not only values close to the mean.

For a version of this estimator that can be used on high-dimensional arrays, see SpatialDispersion.

Implements

  • codify. Used for encoding inputs where ordering matters (e.g. time series).
ComplexityMeasures.EbrahimiType
Ebrahimi <: DifferentialInfoEstimator
Ebrahimi(definition = Shannon(); m::Int = 1)

The Ebrahimi estimator computes the Shannon information of a timeseries using the method from Ebrahimi1994, with logarithms to the base specified in definition.

The Ebrahimi estimator belongs to a class of differential entropy estimators based on order statistics. It only works for timeseries input.

Description

Assume we have samples $\bar{X} = \{x_1, x_2, \ldots, x_N \}$ from a continuous random variable $X \in \mathbb{R}$ with support $\mathcal{X}$ and density function$f : \mathbb{R} \to \mathbb{R}$. Ebrahimi estimates the Shannon differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]

However, instead of estimating the above integral directly, it makes use of the equivalent integral, where $F$ is the distribution function for $X$,

\[H(X) = \int_0^1 \log \left(\dfrac{d}{dp}F^{-1}(p) \right) dp\]

This integral is approximated by first computing the order statistics of $\bar{X}$ (the input timeseries), i.e. $x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)}$. The Ebrahimi Shannon differential entropy estimate is then

\[\hat{H}_{E}(\bar{X}, m) = \dfrac{1}{n} \sum_{i = 1}^n \log \left[ \dfrac{n}{c_i m} (\bar{X}_{(i+m)} - \bar{X}_{(i-m)}) \right],\]

where

\[c_i = \begin{cases} 1 + \frac{i - 1}{m}, & 1 \geq i \geq m \\ 2, & m + 1 \geq i \geq n - m \\ 1 + \frac{n - i}{m} & n - m + 1 \geq i \geq n \end{cases}.\]

See also: information, Correa, AlizadehArghami, Vasicek, DifferentialInfoEstimator.

ComplexityMeasures.EncodingType
Encoding

The supertype for all encoding schemes. Encodings always encode elements of input data into the positive integers. The encoding API is defined by the functions encode and decode. Some probability estimators utilize encodings internally.

Current available encodings are:

ComplexityMeasures.FixedRectangularBinningType
FixedRectangularBinning(range::AbstractRange, D::Int = 1, precise = false)

This is a convenience method where each dimension of the binning has the same range and the input data are D dimensional, which defaults to 1 (timeseries).

ComplexityMeasures.FixedRectangularBinningType
FixedRectangularBinning <: AbstractBinning
FixedRectangularBinning(ranges::Tuple{<:AbstractRange...}, precise = false)

Rectangular box partition of state space where the partition along each dimension is explicitly given by each range ranges, which is a tuple of AbstractRange subtypes. Typically, each range is the output of the range Base function, e.g., ranges = (0:0.1:1, range(0, 1; length = 101), range(2.1, 3.2; step = 0.33)). All ranges must be sorted.

The optional second argument precise dictates whether Julia Base's TwicePrecision is used for when searching where a point falls into the range. Useful for edge cases of points being almost exactly on the bin edges, but it is exactly four times as slow, so by default it is false.

Points falling outside the partition do not contribute to probabilities. Bins are always left-closed-right-open: [a, b). This means that the last value of each of the ranges dictates the last right-closing value. This value does not belong to the histogram! E.g., if given a range r = range(0, 1; length = 11), with r[end] = 1, the value 1 is outside the partition and would not attribute any increase of the probability corresponding to the last bin (here [0.9, 1))!

Equivalently, the size of the histogram is histsize = map(r -> length(r)-1, ranges)!

FixedRectangularBinning leads to a well-defined outcome space without knowledge of input data, see ValueBinning.

ComplexityMeasures.FluctuationComplexityType
FluctuationComplexity <: InformationMeasure
FluctuationComplexity(; definition = Shannon(; base = 2), base = 2)

The "fluctuation complexity" quantifies the standard deviation of the information content of the states $\omega_i$ around some summary statistic (InformationMeasure) of a PMF. Specifically, given some outcome space $\Omega$ with outcomes $\omega_i \in \Omega$ and a probability mass function $p(\Omega) = \{ p(\omega_i) \}_{i=1}^N$, it is defined as

\[\sigma_I(p) := \sqrt{\sum_{i=1}^N p_i(I_i - H_*)^2}\]

where $I_i = -\log_{base}(p_i)$ is the information content of the i-th outcome. The type of information measure $*$ is controlled by definition.

The base controls the base of the logarithm that goes into the information content terms. Make sure that you pick a base that is consistent with the base chosen for the definition (relevant for e.g. Shannon).

Properties

If definition is the Shannon entropy, then we recover the Shannon-type information fluctuation complexity from Bates1993. Then the fluctuation complexity is zero for PMFs with only a single non-zero element, or for the uniform distribution.

If definition is not Shannon entropy, then the properties of the measure varies, and does not necessarily share the properties Bates1993.

Potential for new research

As far as we know, using other information measures besides Shannon entropy for the fluctuation complexity hasn't been explored in the literature yet. Our implementation, however, allows for it. Please inform us if you try some new combinations!

ComplexityMeasures.GaoType
Gao <: DifferentialInfoEstimator
Gao(definition = Shannon(); k = 1, w = 0, corrected = true)

The Gao estimator Gao2015 computes the Shannon differential information, using a k-th nearest-neighbor approach based on Singh2003, with logarithms to the base specified in definition.

w is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to 0, meaning that only the point itself is excluded when searching for neighbours).

Gao2015 give two variants of this estimator. If corrected == false, then the uncorrected version is used. If corrected == true, then the corrected version is used, which ensures that the estimator is asymptotically unbiased.

Description

Assume we have samples $\{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ from a continuous random variable $X \in \mathbb{R}^d$ with support $\mathcal{X}$ and density function$f : \mathbb{R}^d \to \mathbb{R}$. KozachenkoLeonenko estimates the Shannon differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]

ComplexityMeasures.GaussianCDFEncodingType
GaussianCDFEncoding <: Encoding
GaussianCDFEncoding{m}(; μ, σ, c::Int = 3)

An encoding scheme that encodes a scalar or vector χ into one of the integers sᵢ ∈ [1, 2, …, c] based on the normal cumulative distribution function (NCDF), and decodes the sᵢ into subintervals of [0, 1] (with some loss of information).

Initializing a GaussianCDFEncoding

The size of the input to be encoded must be known beforehand. One must therefore set m = length(χ), where χ is the input (m = 1 for scalars, m ≥ 2 for vectors). To do so, one must explicitly give m as a type parameter: e.g. encoding = GaussianCDFEncoding{3}(; μ = 0.0, σ = 0.1) to encode 3-element vectors, or encoding = GaussianCDFEncoding{1}(; μ = 0.0, σ = 0.1) to encode scalars.

Description

Encoding/decoding scalars

GaussianCDFEncoding first maps an input scalar $χ$ to a new real number $y_ \in [0, 1]$ by using the normal cumulative distribution function (CDF) with the given mean μ and standard deviation σ, according to the map

\[x \to y : y = \dfrac{1}{ \sigma \sqrt{2 \pi}} \int_{-\infty}^{x} e^{(-(x - \mu)^2)/(2 \sigma^2)} dx.\]

Next, the interval [0, 1] is equidistantly binned and enumerated $1, 2, \ldots, c$, and $y$ is linearly mapped to one of these integers using the linear map $y \to z : z = \text{floor}(y(c-1)) + 1$.

Because of the floor operation, some information is lost, so when used with decode, each decoded sᵢ is mapped to a subinterval of [0, 1]. This subinterval is returned as a length-1 Vector{SVector}.

Notice that the decoding step does not yield an element of any outcome space of the estimators that use GaussianCDFEncoding internally, such as Dispersion. That is because these estimators additionally delay embed the encoded data.

Encoding/decoding vectors

If GaussianCDFEncoding is used with a vector χ, then each element of χ is encoded separately, resulting in a length(χ) sequence of integers which may be treated as a CartesianIndex. The encoded symbol s ∈ [1, 2, …, c] is then just the linear index corresponding to this cartesian index (similar to how CombinationEncoding works).

When decoded, the integer symbol s is converted back into its CartesianIndex representation, which is just a sequence of integers that refer to subdivisions of the [0, 1] interval. The relevant subintervals are then returned as a length-χ Vector{SVector}.

Examples

julia> using ComplexityMeasures, Statistics

julia> x = [0.1, 0.4, 0.7, -2.1, 8.0];

julia> μ, σ = mean(x), std(x); encoding = GaussianCDFEncoding(; μ, σ, c = 5)

julia> es = encode.(Ref(encoding), x)
5-element Vector{Int64}:
 2
 2
 3
 1
 5

julia> decode(encoding, 3)
2-element SVector{2, Float64} with indices SOneTo(2):
 0.4
 0.6
ComplexityMeasures.GeneralizedSchuermannType
GeneralizedSchuermann <: DiscreteInfoEstimatorShannon
GeneralizedSchuermann(definition = Shannon(); a = 1.0)

The GeneralizedSchuermann estimator is used with information to compute the discrete Shannon entropy with the bias-corrected estimator given in Grassberger2022.

The "generalized" part of the name, as opposed to the Schurmann2004 estimator (Schuermann), is due to the possibility of picking difference parameters $a_i$ for different outcomes. If different parameters are assigned to the different outcomes, a must be a vector of parameters of length length(outcomes), where the outcomes are obtained using outcomes. See Grassberger2022 for more information. If a is a real number, then $a_i = a \forall i$, and the estimator reduces to the Schuermann estimator.

Description

For a set of $N$ observations over $M$ outcomes, the estimator is given by

\[H_S^{opt} = \varphi(N) - \dfrac{1}{N} \sum_{i=1}^M n_i G_{n_i}(a_i),\]

where $n_i$ is the observed frequency of the i-th outcome,

\[G_n(a) = \varphi(n) + (-1)^n \int_0^a \dfrac{x^{n - 1}}{x + 1} dx,\]

$G_n(1) = G_n$ and $G_n(0) = \varphi(n)$, and

\[G_n = \varphi(n) + (-1)^n \int_0^1 \dfrac{x^{n - 1}}{x + 1} dx.\]

ComplexityMeasures.GoriaType
Goria <: DifferentialInfoEstimator
Goria(measure = Shannon(); k = 1, w = 0)

The Goria estimator Goria2005 computes the Shannon differential information of a multi-dimensional StateSpaceSet, with logarithms to the base specified in definition.

Description

Assume we have samples $\{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ from a continuous random variable $X \in \mathbb{R}^d$ with support $\mathcal{X}$ and density function$f : \mathbb{R}^d \to \mathbb{R}$. Goria estimates the Shannon differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]

Specifically, let $\bf{n}_1, \bf{n}_2, \ldots, \bf{n}_N$ be the distance of the samples $\{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ to their k-th nearest neighbors. Next, let the geometric mean of the distances be

\[\hat{\rho}_k = \left( \prod_{i=1}^N \right)^{\dfrac{1}{N}}\]

Goria2005's estimate of Shannon differential entropy is then

\[\hat{H} = m\hat{\rho}_k + \log(N - 1) - \psi(k) + \log c_1(m),\]

where $c_1(m) = \dfrac{2\pi^\frac{m}{2}}{m \Gamma(m/2)}$ and $\psi$ is the digamma function.

ComplexityMeasures.HorvitzThompsonType
HorvitzThompson <: DiscreteInfoEstimatorShannon
HorvitzThompson(measure::Shannon = Shannon())

The HorvitzThompson estimator is used with information to compute the discrete Shannon entropy according to Horvitz1952.

Description

The Horvitz-Thompson estimator of Shannon entropy is given by

\[H_S^{HT} = -\sum_{i=1}^M \dfrac{p_i \log(p_i) }{1 - (1 - p_i)^N},\]

where $N$ is the sample size and $M$ is the number of outcomes. Given the true probability $p_i$ of the $i$-th outcome, $1 - (1 - p_i)^N$ is the probability that the outcome appears at least once in a sample of size $N$ Arora2022. Dividing by this inclusion probability is a form of weighting, and compensates for situations where certain outcomes have so low probabilities that they are not often observed in a sample, for example in power-law distributions.

ComplexityMeasures.IdentificationType
Identification <: InformationMeasure
Identification()

Identification entropy Ahlswede2006.

Description

The identification entropy is the functional

\[H_I(p) = 2\left( 1 - \sum_{i=1}^N p_i^2 \right).\]

Details about this entropy definition can be found in Ahlswede2021.

ComplexityMeasures.InformationMeasureType
InformationMeasure

InformationMeasure is the supertype of all information measure definitions.

In this package, we define "information measures" as functionals of probability mass functions ("discrete" measures), or of probability density functions ("differential" measures). Examples are (generalized) entropies such as Shannon or Renyi, or extropies like ShannonExtropy. Amigó2018 provides a useful review of generalized entropies.

Used with

Any of the information measures listed below can be used with

  • information, to compute a numerical value for the measure, given some input data.
  • information_maximum, to compute the maximum possible value for the measure.
  • information_normalized, to compute the normalized form of the measure (divided by the maximum possible value).

The information_maximum/information_normalized functions only works with the discrete version of the measure. See docstrings for the above functions for usage examples.

Implementations

Estimators

A particular information measure may have both a discrete and a continuous/differential definition, which are estimated using a DifferentialInfoEstimator or a DifferentialInfoEstimator, respectively.

ComplexityMeasures.InformationMeasureEstimatorType
InformationMeasureEstimator{I <: InformationMeasure}

The supertype of all information measure estimators. Its direct subtypes are DiscreteInfoEstimator and DifferentialInfoEstimator.

Since all estimators must reference a measure definition in some way, we made the following interface decisions:

  1. all estimators have as first type parameter I <: InformationMeasure
  2. all estimators reference the information measure in a definition field
  3. all estimators are defined using Base.@kwdef so that they may be initialized with the syntax Estimator(; definition = Shannon()) (or any other).

Any concrete subtypes must follow the above, e.g.:

Base.@kwdef struct MyEstimator{I <: InformationMeasure, X} <: DiscreteInfoEstimator{I}
    definition::I
    x::X
end
Why separate the *definition* of a measure from *estimators* of a measure?

In real applications, we generally don't have access to the underlying probability mass functions or densities required to compute the various entropy or extropy definitons. Therefore, these information measures must be estimated from finite data. Estimating a particular measure (e.g. Shannon entropy) can be done in many ways, each with its own own pros and cons. We aim to provide a complete library of literature estimators of the various information measures (PRs are welcome!).

ComplexityMeasures.JackknifeType
Jackknife <: DiscreteInfoEstimatorGeneric
Jackknife(definition::InformationMeasure = Shannon())

The Jackknife estimator is used with information to compute any discrete InformationMeasure.

The Jackknife estimator uses the generic jackknife principle to reduce bias. Zahl1977 was the first to apply the jaccknife technique in the context of Shannon entropy estimation. Here, we've generalized his estimator to work with any InformationMeasure.

Description

As an example of the jackknife technique, here is the formula for a jackknife estimate of Shannon entropy

\[H_S^{J} = N H_S^{plugin} - \dfrac{N-1}{N} \sum_{i=1}^N {H_S^{plugin}}^{-\{i\}},\]

where $N$ is the sample size, $H_S^{plugin}$ is the plugin estimate of Shannon entropy, and ${H_S^{plugin}}^{-\{i\}}$ is the plugin estimate, but computed with the $i$-th sample left out.

ComplexityMeasures.KaniadakisType
Kaniadakis <: InformationMeasure
Kaniadakis(; κ = 1.0, base = 2.0)

The Kaniadakis entropy Tsallis2009, used with information to compute

\[H_K(p) = -\sum_{i=1}^N p_i f_\kappa(p_i),\]

\[f_\kappa (x) = \dfrac{x^\kappa - x^{-\kappa}}{2\kappa},\]

where if $\kappa = 0$, regular logarithm to the given base is used, and 0 probabilities are skipped.

ComplexityMeasures.KozachenkoLeonenkoType
KozachenkoLeonenko <: DifferentialInfoEstimator
KozachenkoLeonenko(definition = Shannon(); w::Int = 0)

The KozachenkoLeonenko estimator KozachenkoLeonenko1987 computes the Shannon differential information of a multi-dimensional StateSpaceSet, with logarithms to the base specified in definition.

Description

Assume we have samples $\{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ from a continuous random variable $X \in \mathbb{R}^d$ with support $\mathcal{X}$ and density function$f : \mathbb{R}^d \to \mathbb{R}$. KozachenkoLeonenko estimates the Shannon differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))]\]

using the nearest neighbor method from KozachenkoLeonenko1987, as described in Charzyńska2015.

w is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to 0, meaning that only the point itself is excluded when searching for neighbours).

In contrast to Kraskov, this estimator uses only the closest neighbor.

See also: information, Kraskov, DifferentialInfoEstimator.

ComplexityMeasures.KraskovType
Kraskov <: DifferentialInfoEstimator
Kraskov(definition = Shannon(); k::Int = 1, w::Int = 0)

The Kraskov estimator computes the Shannon differential information of a multi-dimensional StateSpaceSet using the k-th nearest neighbor searches method from Kraskov2004, with logarithms to the base specified in definition.

w is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to 0, meaning that only the point itself is excluded when searching for neighbours).

Description

Assume we have samples $\{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ from a continuous random variable $X \in \mathbb{R}^d$ with support $\mathcal{X}$ and density function$f : \mathbb{R}^d \to \mathbb{R}$. Kraskov estimates the Shannon differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]

See also: information, KozachenkoLeonenko, DifferentialInfoEstimator.

ComplexityMeasures.LempelZiv76Type
LempelZiv76 <: ComplexityEstimator
LempelZiv76()

The Lempel-Ziv, or LempelZiv76, complexity measure LempelZiv1976, which is used with complexity and complexity_normalized.

For results to be comparable across sequences with different length, use the normalized version. Normalized LempelZiv76-complexity is implemented as given in Amigó2004. The normalized measure is close to zero for very regular signals, while for random sequences, it is close to 1 with high probability[Amigó2004]. Note: the normalized LempelZiv76 complexity can be higher than 1[Amigó2004].

The LempelZiv76 measure applies only to binary sequences, i.e. sequences with a two-element alphabet (precisely two distinct outcomes). For performance optimization, we do not check the number of unique elements in the input. If your input sequence is not binary, you must encode it first using one of the implemented Encoding schemes (or encode your data manually).

ComplexityMeasures.LeonenkoProzantoSavaniType
LeonenkoProzantoSavani <: DifferentialInfoEstimator
LeonenkoProzantoSavani(definition = Shannon(); k = 1, w = 0)

The LeonenkoProzantoSavani estimator LeonenkoProzantoSavani2008 computes the Shannon, Renyi, or Tsallis differential information of a multi-dimensional StateSpaceSet, with logarithms to the base specified in definition.

Description

The estimator uses k-th nearest-neighbor searches. w is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to 0, meaning that only the point itself is excluded when searching for neighbours).

For details, see LeonenkoProzantoSavani2008.

ComplexityMeasures.LordType
Lord <: DifferentialInfoEstimator
Lord(measure = Shannon(); k = 10, w = 0)

The Lord estimator Lord2018 estimates the Shannon differential information using a nearest neighbor approach with a local nonuniformity correction (LNC), with logarithms to the base specified in definition.

w is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to 0, meaning that only the point itself is excluded when searching for neighbours).

Description

Assume we have samples $\bar{X} = \{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ from a continuous random variable $X \in \mathbb{R}^d$ with support $\mathcal{X}$ and density function $f : \mathbb{R}^d \to \mathbb{R}$. Lord estimates the Shannon differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))],\]

by using the resubstitution formula

\[\hat{\bar{X}, k} = -\mathbb{E}[\log(f(X))] \approx \sum_{i = 1}^N \log(\hat{f}(\bf{x}_i)),\]

where $\hat{f}(\bf{x}_i)$ is an estimate of the density at $\bf{x}_i$ constructed in a manner such that $\hat{f}(\bf{x}_i) \propto \dfrac{k(x_i) / N}{V_i}$, where $k(x_i)$ is the number of points in the neighborhood of $\bf{x}_i$, and $V_i$ is the volume of that neighborhood.

While most nearest-neighbor based differential entropy estimators uses regular volume elements (e.g. hypercubes, hyperrectangles, hyperspheres) for approximating the local densities $\hat{f}(\bf{x}_i)$, the Lord estimator uses hyperellopsoid volume elements. These hyperellipsoids are, for each query point xᵢ, estimated using singular value decomposition (SVD) on the k-th nearest neighbors of xᵢ. Thus, the hyperellipsoids stretch/compress in response to the local geometry around each sample point. This makes Lord a well-suited entropy estimator for a wide range of systems.

ComplexityMeasures.MillerMadowType
MillerMadow <: DiscreteInfoEstimatorShannon
MillerMadow(measure::Shannon = Shannon())

The MillerMadow estimator is used with information to compute the discrete Shannon entropy according to Miller1955.

Description

The Miller-Madow estimator of Shannon entropy is given by

\[H_S^{MM} = H_S^{plugin} + \dfrac{m - 1}{2N},\]

where $H_S^{plugin}$ is the Shannon entropy estimated using the PlugIn estimator, m is the number of bins with nonzero probability (as defined in Paninski2003), and N is the number of observations.

ComplexityMeasures.MissingDispersionPatternsType
MissingDispersionPatterns <: ComplexityEstimator
MissingDispersionPatterns(o = Dispersion()) → mdp

An estimator for the number of missing dispersion patterns (MDP), a complexity measure which can be used to detect nonlinearity in time series Zhou2023.

Used with complexity or complexity_normalized.

Description

When used with complexity, complexity(mdp) is syntactically equivalent with just missing_outcomes(o). When used with complexity_normalized, the normalization is simply missing_outcomes(o)/total_outcomes(o).

Encoding

Dispersion's linear mapping from CDFs to integers is based on equidistant partitioning of the interval [0, 1]. This is slightly different from Zhou2023, which uses the linear mapping $s_i := \text{round}(y + 0.5)$.

Usage

In Zhou2023, MissingDispersionPatterns is used to detect nonlinearity in time series by comparing the MDP for a time series x to values for an ensemble of surrogates of x, as per the standard analysis of TimeseriesSurrogates.jl If the MDP value of $x$ is significantly larger than some high quantile of the surrogate distribution, then it is taken as evidence for nonlinearity.

See also: Dispersion, ReverseDispersion, total_outcomes.

ComplexityMeasures.NaiveKernelType
NaiveKernel(ϵ::Real; method = KDTree, w = 0, metric = Euclidean()) <: OutcomeSpace

An OutcomeSpace based on a "naive" kernel density estimation approach (KDE), as discussed in PrichardTheiler1995.

Probabilities $P(\mathbf{x}, \epsilon)$ are assigned to every point $\mathbf{x}$ by counting how many other points occupy the space spanned by a hypersphere of radius ϵ around $\mathbf{x}$, according to:

\[P_i( X, \epsilon) \approx \dfrac{1}{N} \sum_{s} B(||X_i - X_j|| < \epsilon),\]

where $B$ gives 1 if the argument is true. Probabilities are then normalized.

Keyword arguments

  • method = KDTree: the search structure supported by Neighborhood.jl. Specifically, use KDTree to use a tree-based neighbor search, or BruteForce for the direct distances between all points. KDTrees heavily outperform direct distances when the dimensionality of the data is much smaller than the data length.
  • w = 0: the Theiler window, which excludes indices $s$ that are within $|i - s| ≤ w$ from the given point $x_i$.
  • metric = Euclidean(): the distance metric.

Outcome space

The outcome space Ω for NaiveKernel are the indices of the input data, eachindex(x). Hence, input x is needed for a well-defined outcome_space. The reason to not return the data points themselves is because duplicate data points may not get assigned same probabilities (due to having different neighbors).

ComplexityMeasures.OrdinalOutcomeSpaceType

The supertype for probability estimators based on permutation patterns.

Subtypes must implement fields:

  • m::Int: The dimension of the permutation patterns.
  • lt::Function: A function determining how ties are to be broken when constructing permutation patterns from embedding vectors.
ComplexityMeasures.OrdinalPatternEncodingType
OrdinalPatternEncoding <: Encoding
OrdinalPatternEncoding{m}(lt = ComplexityMeasures.isless_rand)

An encoding scheme that encodes length-m vectors into their permutation/ordinal patterns and then into the integers based on the Lehmer code. It is used by OrdinalPatterns and similar estimators, see that for a description of the outcome space.

The ordinal/permutation pattern of a vector χ is simply sortperm(χ), which gives the indices that would sort χ in ascending order.

Description

The Lehmer code, as implemented here, is a bijection between the set of factorial(m) possible permutations for a length-m sequence, and the integers 1, 2, …, factorial(m). The encoding step uses algorithm 1 in Berger2019, which is highly optimized. The decoding step is much slower due to missing optimizations (pull requests welcomed!).

Example

julia> using ComplexityMeasures

julia> χ = [4.0, 1.0, 9.0];

julia> c = OrdinalPatternEncoding(3);

julia> i = encode(c, χ)
3

julia> decode(c, i)
3-element SVector{3, Int64} with indices SOneTo(3):
 2
 1
 3

If you want to encode something that is already a permutation pattern, then you can use the non-exported permutation_to_integer function.

ComplexityMeasures.OrdinalPatternsType
OrdinalPatterns <: OutcomeSpace
OrdinalPatterns{m}(τ = 1, lt::Function = ComplexityMeasures.isless_rand)

An OutcomeSpace based on lengh-m ordinal permutation patterns, originally introduced in BandtPompe2002's paper on permutation entropy. Note that m is given as a type parameter, so that when it is a literal integer there are performance accelerations.

When passed to probabilities the output depends on the input data type:

  • Univariate data. If applied to a univariate timeseries (AbstractVector), then the timeseries is first embedded using embedding delay τ and dimension m, resulting in embedding vectors $\{ \bf{x}_i \}_{i=1}^{N-(m-1)\tau}$. Then, for each $\bf{x}_i$, we find its permutation pattern $\pi_{i}$. Probabilities are then estimated as the frequencies of the encoded permutation symbols by using UniqueElements. When giving the resulting probabilities to information, the original permutation entropy is computed BandtPompe2002.
  • Multivariate data. If applied to a an D-dimensional StateSpaceSet, then no embedding is constructed, m must be equal to D and τ is ignored. Each vector $\bf{x}_i$ of the dataset is mapped directly to its permutation pattern $\pi_{i}$ by comparing the relative magnitudes of the elements of $\bf{x}_i$. Like above, probabilities are estimated as the frequencies of the permutation symbols. The resulting probabilities can be used to compute multivariate permutation entropy He2016, although here we don't perform any further subdivision of the permutation patterns (as in Figure 3 of He2016).

Internally, OrdinalPatterns uses the OrdinalPatternEncoding to represent ordinal patterns as integers for efficient computations.

See WeightedOrdinalPatterns and AmplitudeAwareOrdinalPatterns for estimators that not only consider ordinal (sorting) patterns, but also incorporate information about within-state-vector amplitudes. For a version of this estimator that can be used on spatial data, see SpatialOrdinalPatterns.

Handling equal values in ordinal patterns

In BandtPompe2002, equal values are ordered after their order of appearance, but this can lead to erroneous temporal correlations, especially for data with low amplitude resolution Zunino2017. Here, by default, if two values are equal, then one of the is randomly assigned as "the largest", using lt = ComplexityMeasures.isless_rand. To get the behaviour from BandtPompe2002, use lt = Base.isless.

Outcome space

The outcome space Ω for OrdinalPatterns is the set of length-m ordinal patterns (i.e. permutations) that can be formed by the integers 1, 2, …, m. There are factorial(m) such patterns.

For example, the outcome [2, 3, 1] corresponds to the ordinal pattern of having the smallest value in the second position, the next smallest value in the third position, and the next smallest, i.e. the largest value in the first position. See also OrdinalPatternEncoding.

In-place symbolization

OrdinalPatterns also implements the in-place probabilities! for StateSpaceSet input (or embedded vector input) for reducing allocations in looping scenarios. The length of the pre-allocated symbol vector must be the length of the dataset. For example

using ComplexityMeasures
m, N = 2, 100
est = OrdinalPatterns{m}(τ)
x = StateSpaceSet(rand(N, m)) # some input dataset
πs_ts = zeros(Int, N) # length must match length of `x`
p = probabilities!(πs_ts, est, x)
ComplexityMeasures.OutcomeType
Outcome <: Number
Outcome(num::Integer)

A convenience wrapper around around an Integer that represents an unspecified but enumerated outcome. It exists to distinguish the case of generic outcomes (which are allocated when using counts) vs actual integer outcomes (which may be allocated when using counts_and_outcomes). It is also used for pretty-printing Counts and Probabilities.

ComplexityMeasures.OutcomeSpaceType
OutcomeSpace

The supertype for all outcome space implementation.

Description

In ComplexityMeasures.jl, an outcome space defines a set of possible outcomes $\Omega = \{\omega_1, \omega_2, \ldots, \omega_L \}$ (some form of discretization). In the literature, the outcome space is often also called an "alphabet", while each outcome is called a "symbol" or an "event".

An outcome space also defines a set of rules for mapping input data to to each outcome $\omega_i$, a processes called encoding or symbolizing or discretizing in the literature (see encodings). Some OutcomeSpaces first apply a transformation, e.g. a delay embedding, to the data before discretizing/encoding, while other OutcomeSpaces discretize/encode the data directly.

Implementations

Outcome spacePrincipleInput dataCounting-compatible
UniqueElementsCount of unique elementsAny
ValueBinningBinning (histogram)Vector, StateSpaceSet
OrdinalPatternsOrdinal patternsVector, StateSpaceSet
SpatialOrdinalPatternsOrdinal patterns in spaceArray
DispersionDispersion patternsVector
SpatialDispersionDispersion patterns in spaceArray
CosineSimilarityBinningCosine similarityVector
BubbleSortSwapsSwap counts when sortingVector
SequentialPairDistancesSequential state vector distancesVector, StateSpaceSet
TransferOperatorBinning (transfer operator)Vector, StateSpaceSet
NaiveKernelKernel density estimationStateSpaceSet
WeightedOrdinalPatternsOrdinal patternsVector, StateSpaceSet
AmplitudeAwareOrdinalPatternsOrdinal patternsVector, StateSpaceSet
WaveletOverlapWavelet transformVector
PowerSpectrumFourier transformVector

In the column "input data" it is assumed that the eltype of the input is <: Real.

Usage

Outcome spaces are used as input to

Counting-compatible vs. non-counting compatible outcome spaces

There are two main types of outcome spaces.

  • Counting-compatible outcome spaces have a well-defined way of counting how often each point in the (encoded) input data is mapped to a particular outcome $\omega_i$. These outcome spaces use encode to discretize the input data. Examples are OrdinalPatterns (which encodes input data into ordinal patterns) or ValueBinning (which discretizes points onto a regular grid). The table below lists which outcome spaces are counting compatible.
  • Non-counting compatible outcome spaces have no well-defined way of counting explicitly how often each point in the input data is mapped to a particular outcome $\omega_i$. Instead, these outcome spaces returns a vector of pre-normalized "relative counts", one for each outcome $\omega_i$. Examples are WaveletOverlap or PowerSpectrum.

Counting-compatible outcome spaces can be used with any ProbabilitiesEstimator to convert counts into probability mass functions. Non-counting-compatible outcome spaces can only be used with the maximum likelihood (RelativeAmount) probabilities estimator, which estimates probabilities precisely by the relative frequency of each outcome (formally speaking, the RelativeAmount estimator also requires counts, but for the sake of code consistency, we allow it to be used with relative frequencies as well).

The function is_counting_based can be used to check whether an outcome space is based on counting.

Deducing the outcome space (from data)

Some outcome space models can deduce $\Omega$ without knowledge of the input, such as OrdinalPatterns. Other outcome spaces require knowledge of the input data for concretely specifying $\Omega$, such as ValueBinning with RectangularBinning. If o is some outcome space model and x some input data, then outcome_space(o, x) returns the possible outcomes $\Omega$. To get the cardinality of $\Omega$, use total_outcomes.

Implementation details

The element type of $\Omega$ varies between outcome space models, but it is guaranteed to be hashable and sortable. This allows for conveniently tracking the counts of a specific event across experimental realizations, by using the outcome as a dictionary key and the counts as the value for that key (or, alternatively, the key remains the outcome and one has a vector of probabilities, one for each experimental realization).

ComplexityMeasures.PairDistanceEncodingType
PairDistanceEncoding <: Encoding
PairDistanceEncoding(min_dist, max_dist; n = 2, metric = Chebyshev(), precise = false)

An encoding that encodes point pairs of the form Tuple{<:AbstractVector, <:AbstractVector} by first computing their distance using the given metric, then dividing the interval [min_dist, max_dist] into n equal-size bins, and mapping the computed distance onto one of those bins. Bins are enumerated as 1:n. When decode-ing the bin integer, the left edge of the bin is returned.

precise has the same meaning as in RectangularBinEncoding.

Example

Let's create an example where the minimum and maximum allowed distance is known.

using ComplexityMeasures, Distances, StaticArrays
m = Chebyshev()
y = [SVector(1.0), SVector(0.5), SVector(0.25), SVector(0.64)]
pair1, pair2, pair3 = (y[1], y[2]), (y[2], y[3]), (y[3], y[4])
dmax = m(pair1...) # dist = 0.50
dmin = m(pair2...) # dist = 0.25
dmid = m(pair3...) # dist = 0.39

# This should give five bins with left adges at [0.25], [0.30], [0.35], [0.40] and [0.45]
encoding = PairDistanceEncoding(dmin, dmax; n = 5, metric = m)
c1 = encode(encoding, pair1) # 5
c2 = encode(encoding, pair2) # 1
c3 = encode(encoding, pair3) # 3

decode(encoding, c3) ≈ [0.35] # true
ComplexityMeasures.PlugInType
PlugIn(e::InformationMeasure) <: DiscreteInfoEstimatorGeneric

The PlugIn estimator is also called the empirical/naive/"maximum likelihood" estimator, and is used with information to any discrete InformationMeasure.

It computes any quantity exactly as given by its formula. When computing an information measure, which here is defined as a probabilities functional, it computes the quantity directly from a probability mass function, which is derived from maximum-likelihood (RelativeAmount estimates of the probabilities.

Bias of plug-in estimates

The plugin-estimator of Shannon entropy underestimates the true entropy, with a bias that grows with the number of distinct outcomes (Arora et al., 2022)Arora2022,

\[bias(H_S^{plugin}) = -\dfrac{K-1}{2N} + o(N^-1).\]

where K is the number of distinct outcomes, and N is the sample size. Many authors have tried to remedy this by proposing alternative Shannon entropy estimators. For example, the MillerMadow estimator is a simple correction to the plug-in estimator that adds back the bias term above. Many other estimators exist; see DiscreteInfoEstimators for an overview.

ComplexityMeasures.PowerSpectrumType
PowerSpectrum() <: OutcomeSpace

An OutcomeSpace based on the power spectrum of a timeseries (amplitude square of its Fourier transform).

If used with probabilities, then the spectrum normalized to sum = 1 is returned as probabilities. The Shannon entropy of these probabilities is typically referred in the literature as spectral entropy, e.g. Llanos2017 and Tian2017.

The closer the spectrum is to flat, i.e., white noise, the higher the entropy. However, you can't compare entropies of timeseries with different length, because the binning in spectral space depends on the length of the input.

Outcome space

The outcome space Ω for PowerSpectrum is the set of frequencies in Fourier space. They should be multiplied with the sampling rate of the signal, which is assumed to be 1. Input x is needed for a well-defined outcome_space.

ComplexityMeasures.PrintComponentType
PrintComponent
PrintComponent(s; color::Union{Symbol,Int} = :normal,
    bold::Bool = false, underline::Bool = false, blink::Bool = false,
    hidden::Bool = false, reverse::Bool = false)

Stores a string s and instructions for how it shall be printed.

PrintComponents are intended for use with printstyled.

ComplexityMeasures.ProbabilitiesType
Probabilities <: Array{<:AbstractFloat, N}
Probabilities(probs::Array [, outcomes [, dimlabels]]) → p
Probabilities(counts::Counts [, outcomes [, dimlabels]]) → p

Probabilities stores an N-dimensional array of probabilities, while ensuring that the array sums to 1 (normalized probability mass). In most cases the array is a standard vector. p itself can be manipulated and iterated over, just like its stored array.

The probabilities correspond to outcomes that describe the axes of the array. If p isa Probabilities, then p.outcomes[i] is an an abstract vector containing the outcomes along the i-th dimension. The outcomes have the same ordering as the probabilities, so that p[i][j] is the probability for outcome p.outcomes[i][j]. The dimensions of the array are named, and can be accessed by p.dimlabels, where p.dimlabels[i] is the label of the i-th dimension. Both outcomes and dimlabels are assigned automatically if not given. If the input is a set of Counts, and outcomes and dimlabels are not given, then the labels and outcomes are inherited from the counts.

Examples

julia> probs = [0.2, 0.2, 0.2, 0.2]; Probabilities(probs) # will be normalized to sum to 1
 Probabilities{Float64,1} over 4 outcomes
 Outcome(1)  0.25
 Outcome(2)  0.25
 Outcome(3)  0.25
 Outcome(4)  0.25
julia> c = Counts([12, 16, 12], ["out1", "out2", "out3"]); Probabilities(c)
 Probabilities{Float64,1} over 3 outcomes
 "out1"  0.3
 "out2"  0.4
 "out3"  0.3
ComplexityMeasures.ProbabilitiesEstimatorType
ProbabilitiesEstimator

The supertype for all probabilities estimators.

The role of the probabilities estimator is to convert (pseudo-)counts to probabilities. Currently, the implementation of all probabilities estimators assume finite outcome space with known cardinality. Therefore, ProbabilitiesEstimator accept an OutcomeSpace as the first argument, which specifies the set of possible outcomes.

Probabilities estimators are used with probabilities and allprobabilities_and_outcomes.

Implementations

The default probabilities estimator is RelativeAmount, which is compatible with any OutcomeSpace. The following estimators only support counting-based outcomes.

Description

In ComplexityMeasures.jl, probability mass functions are estimated from data by defining a set of possible outcomes $\Omega = \{\omega_1, \omega_2, \ldots, \omega_L \}$ (by specifying an OutcomeSpace), and assigning to each outcome $\omega_i$ a probability $p(\omega_i)$, such that $\sum_{i=1}^N p(\omega_i) = 1$ (by specifying a ProbabilitiesEstimator).

ComplexityMeasures.RectangularBinningType
RectangularBinning(ϵ, precise = false) <: AbstractBinning

Rectangular box partition of state space using the scheme ϵ, deducing the histogram extent and bin width from the input data.

RectangularBinning is a convenience struct. It is re-cast into FixedRectangularBinning once the data are provided, so see that docstring for info on the bin calculation and the meaning of precise.

Binning instructions are deduced from the type of ϵ as follows:

  1. ϵ::Int divides each coordinate axis into ϵ equal-length intervals that cover all data.
  2. ϵ::Float64 divides each coordinate axis into intervals of fixed size ϵ, starting from the axis minima until the data is completely covered by boxes.
  3. ϵ::Vector{Int} divides the i-th coordinate axis into ϵ[i] equal-length intervals that cover all data.
  4. ϵ::Vector{Float64} divides the i-th coordinate axis into intervals of fixed size ϵ[i], starting from the axis minima until the data is completely covered by boxes.

RectangularBinning ensures all input data are covered by extending the created ranges if need be.

ComplexityMeasures.RegularDownsamplingType
RegularDownsampling <: MultiScaleAlgorithm
RegularDownsampling(; f::Function = Statistics.mean, scales = 1:8)

The original multi-scale algorithm for multiscale entropy analysis Costa2002, which yields a single downsampled time series per scale s.

Description

Given a scalar-valued input time series x, the Regular multiscale algorithm downsamples and coarse-grains x by splitting it into non-overlapping windows of length s, and then constructing a new downsampled time series $D_t(s, f)$ by applying the function f to each of the resulting length-s windows.

The downsampled time series D_t(s) with t ∈ [1, 2, …, L], where L = floor(N / s), is given by:

\[\{ D_t(s, f) \}_{t = 1}^{L} = \left\{ f \left( \bf x_t \right) \right\}_{t = 1}^{L} = \left\{ {f\left( (x_i)_{i = (t - 1)s + 1}^{ts} \right)} \right\}_{t = 1}^{L}\]

where f is some summary statistic applied to the length-ts-((t - 1)s + 1) tuples xₖ. Different choices of f have yield different multiscale methods appearing in the literature. For example:

  • f == Statistics.mean yields the original first-moment multiscale sample entropy Costa2002.
  • f == Statistics.var yields the generalized multiscale sample entropy Costa2015, which uses the second-moment (variance) instead of the mean.

Keyword Arguments

  • scales. The downsampling levels. If scales is set to an integer, then this integer is taken as maximum number of scales (i.e. levels of downsampling), and downsampling is done over levels 1:scales. Otherwise, downsampling is done over the provided scales (which may be a range, or some specific scales (e.g. scales = [1, 5, 6]). The maximum scale level is length(x) ÷ 2, but to avoid applying the method to time series that are extremely short, consider limiting the maximum scale (e.g. scales = length(x) ÷ 5).

See also: CompositeDownsampling.

ComplexityMeasures.RelativeAmountType
RelativeAmount <: ProbabilitiesEstimator
RelativeAmount()

The RelativeAmount estimator is used with probabilities and related functions to estimate probabilities over the given OutcomeSpace using maximum likelihood estimation (MLE), also called plug-in estimation. See ProbabilitiesEstimator for usage.

Description

Consider a length-m outcome space $\Omega$ and random sample of length N. The maximum likelihood estimate of the probability of the k-th outcome $\omega_k$ is

\[p(\omega_k) = \dfrac{n_k}{N},\]

where $n_k$ is the number of times the k-th outcome was observed in the (encoded) sample.

This estimation is known as maximum likelihood estimation. However, RelativeAmount also serves as the fall-back probabilities estimator for OutcomeSpaces that are not count-based and only yield "pseudo-counts", for example WaveletOverlap or PowerSpectrum. These outcome spaces do not yield counts, but pre-normalized numbers that can be treated as "relative frequencies" or "relative power". Hence, this estimator is called RelativeAmount.

Examples

using ComplexityMeasures
x = cumsum(randn(100))
ps = probabilities(OrdinalPatterns{3}(), x) # `RelativeAmount` is the default estimator
ps_mle = probabilities(RelativeAmount(), OrdinalPatterns{3}(), x) # equivalent
ps == ps_mle # true

See also: BayesianRegularization, Shrinkage.

ComplexityMeasures.RelativeFirstDifferenceEncodingType
RelativeFirstDifferenceEncoding <: Encoding
RelativeFirstDifferenceEncoding(minval::Real, maxval::Real; n = 2)

RelativeFirstDifferenceEncoding encodes a vector based on the relative position the average of the first differences of the vectors has with respect to a predefined minimum and maximum value (minval and maxval, respectively).

Description

This encoding is inspired by Azami2016's algorithm for amplitude-aware permutation entropy. They use a linear combination of amplitude information and first differences information of state vectors to correct probabilities. Here, however, we explicitly encode the first differences part of the correction as an a integer symbol Λ ∈ [1, 2, …, n]. The amplitude part of the encoding is available as the RelativeMeanEncoding encoding.

Encoding/decoding

When used with encode, an $m$-element state vector $\bf{x} = (x_1, x_2, \ldots, x_m)$ is encoded as $Λ = \dfrac{1}{m - 1}\sum_{k=2}^m |x_{k} - x_{k-1}|$. The value of $Λ$ is then normalized to lie on the interval [0, 1], assuming that the minimum/maximum value any single $abs(x_k - x_{k-1})$ can take is minval/maxval, respectively. Finally, the interval [0, 1] is discretized into n discrete bins, enumerated by positive integers 1, 2, …, n, and the number of the bin that the normalized $Λ$ falls into is returned. The smaller the mean first difference of the state vector is, the smaller the bin number is. The higher the mean first difference of the state vectors is, the higher the bin number is.

When used with decode, the left-edge of the bin that the normalized $Λ$ fell into is returned.

Performance tips

If you are encoding multiple input vectors, it is more efficient to construct a RelativeFirstDifferenceEncoding instance and re-use it:

minval, maxval = 0, 1
encoding = RelativeFirstDifferenceEncoding(minval, maxval; n = 4)
pts = [rand(3) for i = 1:1000]
[encode(encoding, x) for x in pts]
ComplexityMeasures.RelativeMeanEncodingType
RelativeMeanEncoding <: Encoding
RelativeMeanEncoding(minval::Real, maxval::Real; n = 2)

RelativeMeanEncoding encodes a vector based on the relative position the mean of the vector has with respect to a predefined minimum and maximum value (minval and maxval, respectively).

Description

This encoding is inspired by Azami2016's algorithm for amplitude-aware permutation entropy. They use a linear combination of amplitude information and first differences information of state vectors to correct probabilities. Here, however, we explicitly encode the amplitude-part of the correction as an a integer symbol Λ ∈ [1, 2, …, n]. The first-difference part of the encoding is available as the RelativeFirstDifferenceEncoding encoding.

Encoding/decoding

When used with encode, an $m$-element state vector $\bf{x} = (x_1, x_2, \ldots, x_m)$ is encoded as $Λ = \dfrac{1}{N}\sum_{i=1}^m abs(x_i)$. The value of $Λ$ is then normalized to lie on the interval [0, 1], assuming that the minimum/maximum value any single element $x_i$ can take is minval/maxval, respectively. Finally, the interval [0, 1] is discretized into n discrete bins, enumerated by positive integers 1, 2, …, n, and the number of the bin that the normalized $Λ$ falls into is returned.

When used with decode, the left-edge of the bin that the normalized $Λ$ fell into is returned.

ComplexityMeasures.RenyiType
Renyi <: InformationMeasure
Renyi(q, base = 2)
Renyi(; q = 1.0, base = 2)

The Rényi generalized order-q entropy Rényi1961, used with information to compute an entropy with units given by base (typically 2 or MathConstants.e).

Description

Let $p$ be an array of probabilities (summing to 1). Then the Rényi generalized entropy is

\[H_q(p) = \frac{1}{1-q} \log \left(\sum_i p[i]^q\right)\]

and generalizes other known entropies, like e.g. the information entropy ($q = 1$, see Shannon1948), the maximum entropy ($q=0$, also known as Hartley entropy), or the correlation entropy ($q = 2$, also known as collision entropy).

The maximum value of the Rényi entropy is $\log_{base}(L)$, which is the entropy of the uniform distribution with $L$ the total_outcomes.

ComplexityMeasures.RenyiExtropyType
RenyiExtropy <: InformationMeasure
RenyiExtropy(; q = 1.0, base = 2)

The Rényi extropy Liu2023.

Description

RenyiExtropy is used with information to compute

\[J_R(P) = \dfrac{-(n - 1) \log{(n - 1)} + (n - 1) \log{ \left( \sum_{i=1}^N {(1 - p[i])}^q \right)} }{q - 1}\]

for a probability distribution $P = \{p_1, p_2, \ldots, p_N\}$, with the $\log$ at the given base. Alternatively, RenyiExtropy can be used with information_normalized, which ensures that the computed extropy is on the interval $[0, 1]$ by normalizing to to the maximal Rényi extropy, given by

\[J_R(P) = (N - 1)\log \left( \dfrac{n}{n-1} \right) .\]

ComplexityMeasures.ReverseDispersionType
ReverseDispersion <: ComplexityEstimator
ReverseDispersion(; c = 3, m = 2, τ = 1, check_unique = true)

Estimator for the reverse dispersion entropy complexity measure Li2019.

Description

Li2019 defines the reverse dispersion entropy as

\[H_{rde} = \sum_{i = 1}^{c^m} \left(p_i - \dfrac{1}{{c^m}} \right)^2 = \left( \sum_{i=1}^{c^m} p_i^2 \right) - \dfrac{1}{c^{m}}\]

where the probabilities $p_i$ are obtained precisely as for the Dispersion probability estimator. Relative frequencies of dispersion patterns are computed using the given encoding scheme , which defaults to encoding using the normal cumulative distribution function (NCDF), as implemented by GaussianCDFEncoding, using embedding dimension m and embedding delay τ. Recommended parameter valuesLi2018 are m ∈ [2, 3], τ = 1 for the embedding, and c ∈ [3, 4, …, 8] categories for the Gaussian mapping.

If normalizing, then the reverse dispersion entropy is normalized to [0, 1].

The minimum value of $H_{rde}$ is zero and occurs precisely when the dispersion pattern distribution is flat, which occurs when all $p_i$s are equal to $1/c^m$. Because $H_{rde} \geq 0$, $H_{rde}$ can therefore be said to be a measure of how far the dispersion pattern probability distribution is from white noise.

Data requirements

The input must have more than one unique element for the default GaussianCDFEncoding to be well-defined. Li2018 recommends that x has at least 1000 data points.

If check_unique == true (default), then it is checked that the input has more than one unique value. If check_unique == false and the input only has one unique element, then a InexactError is thrown when trying to compute probabilities.

ComplexityMeasures.SampleEntropyType
SampleEntropy([x]; r = 0.2std(x), kwargs...) <: ComplexityEstimator

An estimator for the sample entropy complexity measure Richman2000, used with complexity and complexity_normalized.

The keyword argument r is mandatory if an input timeseries x is not provided.

Keyword arguments

  • r::Real: The radius used when querying for nearest neighbors around points. Its value should be determined from the input data, for example as some proportion of the standard deviation of the data.
  • m::Int = 2: The embedding dimension.
  • τ::Int = 1: The embedding lag.

Description

An estimator for sample entropy using radius r, embedding dimension m, and embedding lag τ is

\[SampEn(m,r, N) = -\ln{\dfrac{A(r, N)}{B(r, N)}}.\]

Here,

\[\begin{aligned} B(r, m, N) = \sum_{i = 1}^{N-m\tau} \sum_{j = 1, j \neq i}^{N-m\tau} \theta(d({\bf x}_i^m, {\bf x}_j^m) \leq r) \\ A(r, m, N) = \sum_{i = 1}^{N-m\tau} \sum_{j = 1, j \neq i}^{N-m\tau} \theta(d({\bf x}_i^{m+1}, {\bf x}_j^{m+1}) \leq r) \\ \end{aligned},\]

where $\theta(\cdot)$ returns 1 if the argument is true and 0 otherwise, and $d(x, y)$ computes the Chebyshev distance between $x$ and $y$, and ${\bf x}_i^{m}$ and ${\bf x}_i^{m+1}$ are m-dimensional and m+1-dimensional embedding vectors, where k-dimensional embedding vectors are constructed from the input timeseries $x(t)$ as

\[{\bf x}_i^k = (x(i), x(i+τ), x(i+2τ), \ldots, x(i+(k-1)\tau)).\]

Quoting Richman & Moorman (2002): "SampEn(m,r,N) will be defined except when B = 0, in which case no regularity has been detected, or when A = 0, which corresponds to a conditional probability of 0 and an infinite value of SampEn(m,r,N)". In these cases, NaN is returned.

If computing the normalized measure, then the resulting sample entropy is on [0, 1].

Flexible embedding lag

The original algorithm fixes τ = 1. All formulas here are modified to account for any τ.

See also: entropy_sample.

ComplexityMeasures.SequentialPairDistancesType
SequentialPairDistances <: CountBasedOutcomeSpace
SequentialPairDistances(x::AbstractVector; n = 3, metric = Chebyshev(), m = 3, τ = 1)
SequentialPairDistances(x::AbstractStateSpaceSet; n = 3, metric = Chebyshev())

An outcome space based on the distribution of distances of sequential pairs of points.

This outcome space appears implicitly as part of the "distribution entropy" introduced by Li2015, which of course can be reproduced here (see example below). We've generalized the method to be used with any InformationMeasure and DiscreteInfoEstimator, and with valid distance metric (from Distances.jl).

Input data x are needed for initialization, because distances must be pre-computed to know the minimum/maximum distances needed for binning the distribution of pairwise distances. If the input is an AbstractVector, then the vector is embedded before computing distances. If the input is an AbstractStateSpaceSet, then the embedding step is skipped and distances are computed directly on each state vector xᵢ ∈ x.

Description

SequentialPairDistances does the following:

  • Transforms the input timeseries x by first embedding it using embedding dimension m and embedding lag τ (or skip this step if the input is already embedded).
  • Computes the distances ds between sequential pairs of points according to the given metric.
  • Divides the interval [minimum(ds), maximum(ds)] into n equal-size bins by using RectangularBinEncoding, then maps the distances onto these bins.

Outcome space

The outcome space Ω for SequentialPairDistances are the bins onto which the pairwise distances are mapped, encoded as the integers 1:n. If you need the actual bin coordinates, these can be recovered with decode (see example below).

Implements

  • codify. Note that the input x is ignored when calling codify, because the input data is already handled when constructing a SequentialPairDistances.

Examples

The outcome bins can be retrieved as follows.

using ComplexityMeasures
x = rand(100)
o = SequentialPairDistances(x)
cts, outs = counts_and_outcomes(o, x)

Computing the "distribution entropy" with n = 3 bins for the distance histogram:

using ComplexityMeasures
x = rand(1000000)
o = SequentialPairDistances(x, n = 3, metric = Chebyshev()) # metric from original paper
h = information(Shannon(base = 2), o, x)
ComplexityMeasures.ShannonType
Shannon <: InformationMeasure
Shannon(; base = 2)

The Shannon Shannon1948 entropy, used with information to compute:

\[H(p) = - \sum_i p[i] \log(p[i])\]

with the $\log$ at the given base.

The maximum value of the Shannon entropy is $\log_{base}(L)$, which is the entropy of the uniform distribution with $L$ the total_outcomes.

ComplexityMeasures.ShannonExtropyType
ShannonExtropy <: InformationMeasure
ShannonExtropy(; base = 2)

The Shannon extropy Lad2015, used with information to compute

\[J(x) = -\sum_{i=1}^N (1 - p[i]) \log{(1 - p[i])},\]

for a probability distribution $P = \{p_1, p_2, \ldots, p_N\}$, with the $\log$ at the given base.

ComplexityMeasures.ShrinkageType
Shrinkage{<:OutcomeSpace} <: ProbabilitiesEstimator
Shrinkage(; t = nothing, λ = nothing)

The Shrinkage estimator is used with probabilities and related functions to estimate probabilities over the given m-element counting-based OutcomeSpace using James-Stein-type shrinkage JamesStein1992, as presented in Hausser2009.

Description

The Shrinkage estimator estimates a cell probability $\theta_{k}^{\text{Shrink}}$ as

\[\theta_{k}^{\text{Shrink}} = \lambda t_k + (1-\lambda) \hat{\theta}_k^{RelativeAmount},\]

where $\lambda \in [0, 1]$ is the shrinkage intensity ($\lambda = 0$ means no shrinkage, and $\lambda = 1$ means full shrinkage), and $t_k$ is the shrinkage target. Hausser2009 picks $t_k = 1/m$, i.e. the uniform distribution.

If t == nothing, then $t_k$ is set to $1/m$ for all $k$, as in Hausser2009. If λ == nothing (the default), then the shrinkage intensity is optimized according to Hausser2009. Hence, you should probably not pick λ nor t manually, unless you know what you are doing.

Assumptions

The Shrinkage estimator assumes a fixed and known number of outcomes m. Thus, using it with probabilities_and_outcomes) and allprobabilities_and_outcomes will yield different results, depending on whether all outcomes are observed in the input data or not. For probabilities_and_outcomes, m is the number of observed outcomes. For allprobabilities_and_outcomes, m = total_outcomes(o, x), where o is the OutcomeSpace and x is the input data.

Note

If used with allprobabilities_and_outcomes, then outcomes which have not been observed may be assigned non-zero probabilities. This might affect your results if using e.g. missing_outcomes.

Examples

using ComplexityMeasures
x = cumsum(randn(100))
ps_shrink = probabilities(Shrinkage(), OrdinalPatterns{3}(), x)

See also: RelativeAmount, BayesianRegularization.

ComplexityMeasures.SpatialBubbleSortSwapsType
SpatialBubbleSortSwaps <: SpatialOutcomeSpace
SpatialBubbleSortSwaps(stencil, x; periodic = true)

SpatialBubbleSortSwaps generalizes BubbleSortSwaps to high-dimensional arrays by encoding pixel/voxel/hypervoxel windows in terms of how many swap operations the bubble sort algorithm requires to sort them.

What does this mean? For BubbleSortSwaps the input data is embedded using embedding dimension m and the number of swaps required are computed for each embedding vector. For SpatialBubbleSortSwaps, the "embedding dimension" m for is inferred from the number of elements in the stencil, and the "embedding vectors" are the hypervoxels selected by the stencil.

Outcome space

The outcome space Ω for SpatialBubbleSortSwaps is the range of integers 0:(n*(n-1)÷2), corresponding to the number of swaps required by the bubble sort algorithm to sort a particular pixel/voxel/hypervoxel window.

Arguments

  • stencil. Defines what local area (hyperrectangle), or which points within this area, to include around each hypervoxel (i.e. pixel in 2D). See SpatialOrdinalPatterns and SpatialDispersion for more information about stencils and examples of how to specify them.
  • x::AbstractArray. The input data. Must be provided because we need to know its size for optimization and bound checking.

Keyword arguments

  • periodic::Bool. If periodic == true, then the stencil should wrap around at the end of the array. If periodic = false, then pixels whose stencil exceeds the array bounds are skipped.

Example

using ComplexityMeasures
using Random; rng = MersenneTwister(1234)

x = rand(rng, 100, 100, 100) # some 3D image
stencil = zeros(Int,2,2,2) # 3D stencil
stencil[:, :, 1] = [1 0; 1 1]
stencil[:, :, 2] = [0 1; 1 0]
o = SpatialBubbleSortSwaps(stencil, x)

# Distribution of "bubble sorting complexity" among voxel windows
counts_and_outcomes(o, x)

# "Spatial bubble Kaniadakis entropy", with shrinkage-adjusted probabilities
information(Kaniadakis(), Shrinkage(), o, x)
ComplexityMeasures.SpatialDispersionType
SpatialDispersion <: OutcomeSpace
SpatialDispersion(stencil, x::AbstractArray;
    periodic = true,
    c = 5,
    skip_encoding = false,
    L = nothing,
)

A dispersion-based OutcomeSpace that generalises Dispersion for input data that are high-dimensional arrays.

SpatialDispersion is based on Azami2019's 2D square dispersion (Shannon) entropy estimator, but is here implemented as a pure probabilities probabilities estimator that is generalized for N-dimensional input data x, with arbitrary neighborhood regions (stencils) and (optionally) periodic boundary conditions.

In combination with information and information_normalized, this probabilities estimator can be used to compute (normalized) generalized spatiotemporal dispersion InformationMeasure of any type.

Arguments

  • stencil. Defines what local area (hyperrectangle), or which points within this area, to include around each hypervoxel (i.e. pixel in 2D). The examples below demonstrate different ways of specifying stencils. For details, see SpatialOrdinalPatterns. See SpatialOrdinalPatterns for more information about stencils.
  • x::AbstractArray. The input data. Must be provided because we need to know its size for optimization and bound checking.

Keyword arguments

  • periodic::Bool. If periodic == true, then the stencil should wrap around at the end of the array. If periodic = false, then pixels whose stencil exceeds the array bounds are skipped.
  • c::Int. Determines how many discrete categories to use for the Gaussian encoding.
  • skip_encoding. If skip_encoding == true, encoding is ignored, and dispersion patterns are computed directly from x, under the assumption that L is the alphabet length for x (useful for categorical or integer data). Thus, if skip_encoding == true, then L must also be specified. This is useful for categorical or integer-valued data.
  • L. If L == nothing (default), then the number of total outcomes is inferred from stencil and encoding. If L is set to an integer, then the data is considered pre-encoded and the number of total outcomes is set to L.

Outcome space

The outcome space for SpatialDispersion is the unique delay vectors whose elements are the the symbols (integers) encoded by the Gaussian CDF. Hence, the outcome space is all m-dimensional delay vectors whose elements are all possible values in 1:c. There are c^m such vectors.

Description

Estimating probabilities/entropies from higher-dimensional data is conceptually simple.

  1. Discretize each value (hypervoxel) in x relative to all other values xᵢ ∈ x using the provided encoding scheme.
  2. Use stencil to extract relevant (discretized) points around each hypervoxel.
  3. Construct a symbol these points.
  4. Take the sum-normalized histogram of the symbol as a probability distribution.
  5. Optionally, compute information or information_normalized from this probability distribution.

Usage

Here's how to compute spatial dispersion entropy using the three different ways of specifying stencils.

x = rand(50, 50) # first "time slice" of a spatial system evolution

# Cartesian stencil
stencil_cartesian = CartesianIndex.([(0,0), (1,0), (1,1), (0,1)])
est = SpatialDispersion(stencil_cartesian, x)
information_normalized(est, x)

# Extent/lag stencil
extent = (2, 2); lag = (1, 1); stencil_ext_lag = (extent, lag)
est = SpatialDispersion(stencil_ext_lag, x)
information_normalized(est, x)

# Matrix stencil
stencil_matrix = [1 1; 1 1]
est = SpatialDispersion(stencil_matrix, x)
information_normalized(est, x)

To apply this to timeseries of spatial data, simply loop over the call (broadcast), e.g.:

imgs = [rand(50, 50) for i = 1:100]; # one image per second over 100 seconds
stencil = ((2, 2), (1, 1)) # a 2x2 stencil (i.e. dispersion patterns of length 4)
est = SpatialDispersion(stencil, first(imgs))
h_vs_t = information_normalized.(Ref(est), imgs)

Computing generalized spatiotemporal dispersion entropy is trivial, e.g. with Renyi:

x = reshape(repeat(1:5, 500) .+ 0.1*rand(500*5), 50, 50)
est = SpatialDispersion(stencil, x)
information(Renyi(q = 2), est, x)

See also: SpatialOrdinalPatterns, GaussianCDFEncoding, codify.

ComplexityMeasures.SpatialOrdinalPatternsType
SpatialOrdinalPatterns <: OutcomeSpaceModel
SpatialOrdinalPatterns(stencil, x; periodic = true)

A symbolic, permutation-based OutcomeSpace for spatiotemporal systems that generalises OrdinalPatterns to high-dimensional arrays. The order m of the permutation pattern is extracted from the stencil, see below.

SpatialOrdinalPatterns is based on the 2D and 3D spatiotemporal permutation entropy estimators by Ribeiro2012 and Schlemmer2018, respectively, but is here implemented as a pure probabilities probabilities estimator that is generalized for D-dimensional input array x, with arbitrary regions (stencils) to get patterns form and (possibly) periodic boundary conditions.

See below for ways to specify the stencil. If periodic = true, then the stencil wraps around at the ends of the array. If false, then collected regions with indices which exceed the array bounds are skipped.

In combination with information and information_normalized, this probabilities estimator can be used to compute generalized spatiotemporal permutation InformationMeasure of any type.

Outcome space

The outcome space Ω for SpatialOrdinalPatterns is the set of length-m ordinal patterns (i.e. permutations) that can be formed by the integers 1, 2, …, m, ordered lexicographically. There are factorial(m) such patterns. Here m refers to the number of points included in stencil.

Stencils

The stencil defines what local area to use to group hypervoxels. Each grouping of hypervoxels is mapped to an order-m permutation pattern, which is then mapped to an integer as in OrdinalPatterns. The stencil is moved around the input array, in a sense "scanning" the input array, to collect all possible groupings allowed by the boundary condition (periodic or not).

Stencils are passed in one of the following three ways:

  1. As vectors of CartesianIndex which encode the offset of indices to include in the stencil, with respect to the current array index when scanning over the array. For example stencil = CartesianIndex.([(0,0), (0,1), (1,1), (1,0)]). Don't forget to include the zero offset index if you want to include the hypervoxel itself, which is almost always the case. Here the stencil creates a 2x2 square extending to the bottom and right of the pixel (directions here correspond to the way Julia prints matrices by default). When passing a stencil as a vector of CartesianIndex, m = length(stencil).
  2. As a D-dimensional array (where D matches the dimensionality of the input data) containing 0s and 1s, where if stencil[index] == 1, the corresponding pixel is included, and if stencil[index] == 0, it is not included. To generate the same estimator as in 1., use stencil = [1 1; 1 1]. When passing a stencil as a D-dimensional array, m = sum(stencil)
  3. As a Tuple containing two Tuples, both of length D, for D-dimensional data. The first tuple specifies the extent of the stencil, where extent[i] dictates the number of hypervoxels to be included along the ith axis and lag[i] the separation of hypervoxels along the same axis. This method can only generate (hyper)rectangular stencils. To create the same estimator as in the previous examples, use here stencil = ((2, 2), (1, 1)). When passing a stencil using extent and lag, m = prod(extent).
ComplexityMeasures.StatisticalComplexityType
StatisticalComplexity <: ComplexityEstimator
StatisticalComplexity(; kwargs...)

An estimator for the statistical complexity and entropy, originally by Rosso2007 and generalized by Rosso2013.

Our implementation extends the generalization to any valid distance metric, any OutcomeSpace with a priori known total_outcomes, any ProbabilitiesEstimator, and any normalizable discrete InformationMeasure.

Used with complexity.

Keyword arguments

  • o::OutcomeSpace = OrdinalPatterns{3}(). The OutcomeSpace, which controls how the input data are discretized.
  • pest::ProbabilitiesEstimator = RelativeAmount(): The ProbabilitiesEstimator used to estimate probabilities over the discretized input data.
  • hest = Renyi(): A DiscreteInfoEstimator or an InformationMeasure. Any information measure that defines information_maximum is valid here including extropies. The measure will be estimated using the PlugIn estimator if not given an estimator.
  • dist <: SemiMetric = JSDivergence(): The distance measure (from Distances.jl) to use for estimating the distance between the estimated probability distribution and a uniform distribution with the same maximal number of outcomes.

Description

Statistical complexity is defined as

\[C_q[P] = \mathcal{H}_q\cdot \mathcal{Q}_q[P],\]

where $Q_q$ is a "disequilibrium" obtained from a distance-measure and $H_q$ a disorder measure. In the original paperRosso2007, this complexity measure was defined via an ordinal pattern-based probability distribution (see OrdinalPatterns), using Shannon entropy as the information measure, and the Jensen-Shannon divergence as a distance measure.

Our implementation is a further generalization of the complexity measure developed in Rosso2013. We let $H_q$be any normalizable InformationMeasure, e.g. Shannon, Renyi or Tsallis entropy, and we let $Q_q$ be either on the Euclidean, Wooters, Kullback, q-Kullback, Jensen or q-Jensen distance as

\[Q_q[P] = Q_q^0\cdot D[P, P_e],\]

where $D[P, P_e]$ is the distance between the obtained distribution $P$ and a uniform distribution with the same maximum number of bins, measured by the distance measure dist.

Usage

The statistical complexity is exclusively used in combination with the chosen information measure (typically an entropy). The estimated value of the information measure can be accessed as a Ref value of the struct as

x = randn(100)
c = StatisticalComplexity()
compl = complexity(c, x)
entr = first(entropy_complexity(c, x)) # both complexity and entropy value

complexity(c::StatisticalComplexity, x) returns only the statistical complexity. To obtain both the value of the entropy (or other information measure) and the statistical complexity together as a Tuple, use the wrapper entropy_complexity.

See also: entropy_complexity_curves.

ComplexityMeasures.StretchedExponentialType
StretchedExponential <: InformationMeasure
StretchedExponential(; η = 2.0, base = 2)

The stretched exponential, or Anteneodo-Plastino, entropy Anteneodo1999, used with information to compute

\[S_{\eta}(p) = \sum_{i = 1}^N \Gamma \left( \dfrac{\eta + 1}{\eta}, - \log_{base}(p_i) \right) - p_i \Gamma \left( \dfrac{\eta + 1}{\eta} \right),\]

where $\eta \geq 0$, $\Gamma(\cdot, \cdot)$ is the upper incomplete Gamma function, and $\Gamma(\cdot) = \Gamma(\cdot, 0)$ is the Gamma function. Reduces to Shannon entropy for η = 1.0.

The maximum entropy for StrechedExponential is a rather complicated expression involving incomplete Gamma functions (see source code).

ComplexityMeasures.TransferOperatorType
TransferOperator <: OutcomeSpace
TransferOperator(b::AbstractBinning; warn_precise = true, rng = Random.default_rng())

An OutcomeSpace based on binning data into rectangular boxes dictated by the given binning scheme b.

When used with probabilities, then the transfer (Perron-Frobenius) operator is approximated over the bins, then bin probabilities are estimated as the invariant measure associated with that transfer operator. Assumes that the input data are sequential (time-ordered).

This implementation follows the grid estimator approach in Diego2019.

Precision

The default behaviour when using RectangularBinning or FixedRectangularBinning is to accept some loss of precision on the bin boundaries for speed-ups, but this may lead to issues for TransferOperator where some points may be encoded as the symbol -1 ("outside the binning"). The warn_precise keyword controls whether the user is warned when a less precise binning is used.

Outcome space

The outcome space for TransferOperator is the set of unique bins constructed from b. Bins are identified by their left (lowest-value) corners, are given in data units, and are returned as SVectors.

Bin ordering

Bins returned by probabilities_and_outcomes are ordered according to first appearance (i.e. the first time the input (multivariate) timeseries visits the bin). Thus, if

b = RectangularBinning(4)
est = TransferOperator(b)
probs, outcomes = probabilities_and_outcomes(x, est) # x is some timeseries

then probs[i] is the invariant measure (probability) of the bin outcomes[i], which is the i-th bin visited by the timeseries with nonzero measure.

Description

The transfer operator $P^{N}$is computed as an N-by-N matrix of transition probabilities between the states defined by the partition elements, where N is the number of boxes in the partition that is visited by the orbit/points.

If $\{x_t^{(D)} \}_{n=1}^L$ are the $L$ different $D$-dimensional points over which the transfer operator is approximated, $\{ C_{k=1}^N \}$ are the $N$ different partition elements (as dictated by ϵ) that gets visited by the points, and $\phi(x_t) = x_{t+1}$, then

\[P_{ij} = \dfrac {\#\{ x_n | \phi(x_n) \in C_j \cap x_n \in C_i \}} {\#\{ x_m | x_m \in C_i \}},\]

where $\#$ denotes the cardinal. The element $P_{ij}$ thus indicates how many points that are initially in box $C_i$ end up in box $C_j$ when the points in $C_i$ are projected one step forward in time. Thus, the row $P_{ik}^N$ where $k \in \{1, 2, \ldots, N \}$ gives the probability of jumping from the state defined by box $C_i$ to any of the other $N$ states. It follows that $\sum_{k=1}^{N} P_{ik} = 1$ for all $i$. Thus, $P^N$ is a row/right stochastic matrix.

Invariant measure estimation from transfer operator

The left invariant distribution $\mathbf{\rho}^N$ is a row vector, where $\mathbf{\rho}^N P^{N} = \mathbf{\rho}^N$. Hence, $\mathbf{\rho}^N$ is a row eigenvector of the transfer matrix $P^{N}$ associated with eigenvalue 1. The distribution $\mathbf{\rho}^N$ approximates the invariant density of the system subject to binning, and can be taken as a probability distribution over the partition elements.

In practice, the invariant measure $\mathbf{\rho}^N$ is computed using invariantmeasure, which also approximates the transfer matrix. The invariant distribution is initialized as a length-N random distribution which is then applied to $P^{N}$. For reproducibility in this step, set the rng. The resulting length-N distribution is then applied to $P^{N}$ again. This process repeats until the difference between the distributions over consecutive iterations is below some threshold.

See also: RectangularBinning, FixedRectangularBinning, invariantmeasure.

ComplexityMeasures.TransferOperatorApproximationRectangularType
TransferOperatorApproximationRectangular(to, binning::RectangularBinning, mini,
    edgelengths, bins, sort_idxs)

The N-by-N matrix to is an approximation to the transfer operator, subject to the given binning, computed over some set of sequentially ordered points.

For convenience, mini and edgelengths provide the minima and box edge lengths along each coordinate axis, as determined by applying ϵ to the points. The coordinates of the (leftmost, if axis is ordered low-high) box corners are given in bins.

Only bins actually visited by the points are considered, and bins give the coordinates of these bins. The element bins[i] correspond to the i-th state of the system, which corresponds to the i-th column/row of the transfer operator to.

sort_idxs contains the indices that would sort the input points. visitors is a vector of vectors, where visitors[i] contains the indices of the (sorted) points that visits bins[i].

See also: RectangularBinning.

ComplexityMeasures.TsallisType
Tsallis <: InformationMeasure
Tsallis(q; k = 1.0, base = 2)
Tsallis(; q = 1.0, k = 1.0, base = 2)

The Tsallis generalized order-q entropy Tsallis1988, used with information to compute an entropy.

base only applies in the limiting case q == 1, in which the Tsallis entropy reduces to Shannon entropy.

Description

The Tsallis entropy is a generalization of the Boltzmann-Gibbs entropy, with k standing for the Boltzmann constant. It is defined as

\[S_q(p) = \frac{k}{q - 1}\left(1 - \sum_{i} p[i]^q\right)\]

The maximum value of the Tsallis entropy is $k(L^{1 - q} - 1)/(1 - q)$, with $L$ the total_outcomes.

ComplexityMeasures.TsallisExtropyType
TsallisExtropy <: InformationMeasure
TsallisExtropy(; base = 2)

The Tsallis extropy Xue2023.

Description

TsallisExtropy is used with information to compute

\[J_T(P) = k \dfrac{N - 1 - \sum_{i=1}^N ( 1 - p[i])^q}{q - 1}\]

for a probability distribution $P = \{p_1, p_2, \ldots, p_N\}$, with the $\log$ at the given base. Alternatively, TsallisExtropy can be used with information_normalized, which ensures that the computed extropy is on the interval $[0, 1]$ by normalizing to to the maximal Tsallis extropy, given by

\[J_T(P) = \dfrac{(N - 1)N^{q - 1} - (N - 1)^q}{(q - 1)N^{q - 1}}\]

ComplexityMeasures.UniqueElementsType
UniqueElements()

An OutcomeSpace based on straight-forward counting of distinct elements in a univariate time series or multivariate dataset. This is the same as giving no estimator to probabilities.

Outcome space

The outcome space is the unique sorted values of the input. Hence, input x is needed for a well-defined outcome_space.

Implements

  • codify. Used for encoding inputs where ordering matters (e.g. time series).
ComplexityMeasures.UniqueElementsEncodingType
UniqueElementsEncoding <: Encoding
UniqueElementsEncoding(x)

UniqueElementsEncoding is a generic encoding that encodes each xᵢ ∈ unique(x) to one of the positive integers. The xᵢ are encoded according to the order of their first appearance in the input data.

The constructor requires the input data x, since the number of possible symbols is length(unique(x)).

Example

using ComplexityMeasures
x = ['a', 2, 5, 2, 5, 'a']
e = UniqueElementsEncoding(x)
encode.(Ref(e), x) == [1, 2, 3, 2, 3, 1] # true
ComplexityMeasures.ValueBinningType
ValueBinning(b::AbstractBinning) <: OutcomeSpace

An OutcomeSpace based on binning the values of the data as dictated by the binning scheme b and formally computing their histogram, i.e., the frequencies of points in the bins. An alias to this is VisitationFrequency. Available binnings are subtypes of AbstractBinning.

The ValueBinning estimator has a linearithmic time complexity (n log(n) for n = length(x)) and a linear space complexity (l for l = dimension(x)). This allows computation of probabilities (histograms) of high-dimensional datasets and with small box sizes ε without memory overflow and with maximum performance. For performance reasons, the probabilities returned never contain 0s and are arbitrarily ordered.

ValueBinning(ϵ::Union{Real,Vector})

A convenience method that accepts same input as RectangularBinning and initializes this binning directly.

Outcomes

The outcome space for ValueBinning is the unique bins constructed from b. Each bin is identified by its left (lowest-value) corner, because bins are always left-closed-right-open intervals [a, b). The bins are in data units, not integer (cartesian indices units), and are returned as SVectors, i.e., same type as input data.

For convenience, outcome_space returns the outcomes in the same array format as the underlying binning (e.g., Matrix for 2D input).

For FixedRectangularBinning the outcome_space is well-defined from the binning, but for RectangularBinning input x is needed as well.

Implements

  • codify. Used for encoding inputs where ordering matters (e.g. time series).
ComplexityMeasures.VasicekType
Vasicek <: DifferentialInfoEstimator
Vasicek(definition = Shannon(); m::Int = 1)

The Vasicek estimator computes the Shannon differential information of a timeseries using the method from Vasicek1976, with logarithms to the base specified in definition.

The Vasicek estimator belongs to a class of differential entropy estimators based on order statistics, of which Vasicek1976 was the first. It only works for timeseries input.

Description

Assume we have samples $\bar{X} = \{x_1, x_2, \ldots, x_N \}$ from a continuous random variable $X \in \mathbb{R}$ with support $\mathcal{X}$ and density function$f : \mathbb{R} \to \mathbb{R}$. Vasicek estimates the Shannon differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]

However, instead of estimating the above integral directly, it makes use of the equivalent integral, where $F$ is the distribution function for $X$,

\[H(X) = \int_0^1 \log \left(\dfrac{d}{dp}F^{-1}(p) \right) dp\]

This integral is approximated by first computing the order statistics of $\bar{X}$ (the input timeseries), i.e. $x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)}$. The Vasicek Shannon differential entropy estimate is then

\[\hat{H}_V(\bar{X}, m) = \dfrac{1}{n} \sum_{i = 1}^n \log \left[ \dfrac{n}{2m} (\bar{X}_{(i+m)} - \bar{X}_{(i-m)}) \right]\]

Usage

In practice, choice of m influences how fast the entropy converges to the true value. For small value of m, convergence is slow, so we recommend to scale m according to the time series length n and use m >= n/100 (this is just a heuristic based on the tests written for this package).

See also: information, Correa, AlizadehArghami, Ebrahimi, DifferentialInfoEstimator.

ComplexityMeasures.WaveletOverlapType
WaveletOverlap([wavelet]) <: OutcomeSpace

An OutcomeSpace based on the maximal overlap discrete wavelet transform (MODWT).

When used with probabilities, the MODWT is applied to a signal, then probabilities are computed as the (normalized) energies at different wavelet scales. These probabilities are used to compute the wavelet entropy according to Rosso2001. Input timeseries x is needed for a well-defined outcome space.

By default the wavelet Wavelets.WT.Daubechies{12}() is used. Otherwise, you may choose a wavelet from the Wavelets package (it must subtype OrthoWaveletClass).

Outcome space

The outcome space for WaveletOverlap are the integers 1, 2, …, N enumerating the wavelet scales. To obtain a better understanding of what these mean, we prepared a notebook you can view online. As such, this estimator only works for timeseries input and input x is needed for a well-defined outcome_space.

ComplexityMeasures.WeightedOrdinalPatternsType
WeightedOrdinalPatterns <: OutcomeSpace
WeightedOrdinalPatterns{m}(τ = 1, lt::Function = ComplexityMeasures.isless_rand)

A variant of OrdinalPatterns that also incorporates amplitude information, based on the weighted permutation entropy Fadlallah2013. The outcome space and arguments are the same as in OrdinalPatterns.

Description

For each ordinal pattern extracted from each state (or delay) vector, a weight is attached to it which is the variance of the vector. Probabilities are then estimated by summing the weights corresponding to the same pattern, instead of just counting the occurrence of the same pattern.

An implementation note

Note: in equation 7, section III, of the original paper, the authors write

\[w_j = \dfrac{1}{m}\sum_{k=1}^m (x_{j-(k-1)\tau} - \mathbf{\hat{x}}_j^{m, \tau})^2.\]

*But given the formula they give for the arithmetic mean, this is not the variance of the delay vector $\mathbf{x}_i$, because the indices are mixed: $x_{j+(k-1)\tau}$ in the weights formula, vs. $x_{j+(k+1)\tau}$ in the arithmetic mean formula. Here, delay embedding and computation of the patterns and their weights are completely separated processes, ensuring that we compute the arithmetic mean correctly for each vector of the input dataset (which may be a delay-embedded timeseries).

ComplexityMeasures.ZhuType
Zhu <: DifferentialInfoEstimator
Zhu(; definition = Shannon(), k = 1, w = 0)

The Zhu estimator Zhu2015 is an extension to KozachenkoLeonenko, and computes the Shannon differential information of a multi-dimensional StateSpaceSet, with logarithms to the base specified in definition.

Description

Assume we have samples $\{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ from a continuous random variable $X \in \mathbb{R}^d$ with support $\mathcal{X}$ and density function$f : \mathbb{R}^d \to \mathbb{R}$. Zhu estimates the Shannon differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))]\]

by approximating densities within hyperrectangles surrounding each point xᵢ ∈ x using using k nearest neighbor searches. w is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to 0, meaning that only the point itself is excluded when searching for neighbours).

See also: information, KozachenkoLeonenko, DifferentialInfoEstimator.

ComplexityMeasures.ZhuSinghType
ZhuSingh <: DifferentialInfoEstimator
ZhuSingh(definition = Shannon(); k = 1, w = 0)

The ZhuSingh estimator Zhu2015 computes the Shannon differential information of a multi-dimensional StateSpaceSet, with logarithms to the base specified in definition.

Description

Assume we have samples $\{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ from a continuous random variable $X \in \mathbb{R}^d$ with support $\mathcal{X}$ and density function$f : \mathbb{R}^d \to \mathbb{R}$. ZhuSingh estimates the Shannon differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]

Like Zhu, this estimator approximates probabilities within hyperrectangles surrounding each point xᵢ ∈ x using using k nearest neighbor searches. However, it also considers the number of neighbors falling on the borders of these hyperrectangles. This estimator is an extension to the entropy estimator in Singh2003.

w is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to 0, meaning that only the point itself is excluded when searching for neighbours).

See also: information, DifferentialInfoEstimator.

ComplexityMeasures.AAPEFunction
AAPE(x, A::Real = 0.5, m::Int = length(x))

Encode relative amplitude information of the elements of a.

  • A = 1 emphasizes only average values.
  • A = 0 emphasizes changes in amplitude values.
  • A = 0.5 equally emphasizes average values and changes in the amplitude values.
ComplexityMeasures.allcounts_and_outcomesMethod
allcounts_and_outcomes(o::OutcomeSpace, x::Array_or_SSSet) → (cts::Counts{<:Integer, 1}, Ω)

Like counts_and_outcomes, but ensures that all outcomes Ωᵢ ∈ Ω, where Ω = outcome_space(o, x)), are included.

Outcomes that do not occur in the data x get a 0 count.

ComplexityMeasures.allprobabilities_and_outcomesMethod
allprobabilities_and_outcomes(est::ProbabilitiesEstimator, x::Array_or_SSSet) → (p::Probabilities, outs)
allprobabilities_and_outcomes(o::OutcomeSpace, x::Array_or_SSSet) → (p::Probabilities, outs)

The same as probabilities_and_outcomes, but ensures that outcomes with 0 probability are explicitly added in the returned vector. This means that p[i] is the probability of ospace[i], with ospace =outcome_space(est, x).

This function is useful in cases where one wants to compare the probability mass functions of two different input data x, y under the same estimator. E.g., to compute the KL-divergence of the two PMFs assumes that the obey the same indexing. This is not true for probabilities even with the same est, due to the skipping of 0 entries, but it is true for allprobabilities_and_outcomes.

ComplexityMeasures.apply_multiscaleFunction
apply_multiscale(alg::MultiScaleAlgorithm, f::Function, args...)

Define multiscale dispatch for the function f (either information, complexity or their normalized variants) to downsampled timeseries resulting from coarse-graining last(args) (the input data) using coarse-graining algorithm alg with arguments args[1:end-1] (the estimation parameters).

ComplexityMeasures.base10_to_factorialFunction
base10_to_factorial(s::Int,
    ndigits::Int = ndigits_in_factorial_base(s)) → f::SVector{ndigits, Int}

Convert a base-10 integer to its factorial number system representation. f is a vector where f[k] is the multiplier of factorial(k - 1).

For example, the base-10 integer 567, in the factorial number system, is $4\cdot 5! + 3\cdot 4! + 2\cdot 3! + 1\cdot 2! + 1\cdot 1! + 0\cdot 0!$. For this example, base10_to_factorial would return the SVector [4, 3, 2, 1, 1, 0].

ndigits fixes the number of digits in f (this just prepends a zero to f for each extraneous radix/base). This is useful when using factorial number for decoding Lehmer codes into permutations

ComplexityMeasures.cartesian_bin_indexMethod
cartesian_bin_index(e::RectangularBinEncoding, point::SVector)

Return the cartesian index of the given point within the binning encapsulated in e. Internal function called by encode.

ComplexityMeasures.center_neighborhood!Method
center_neighborhood!(C, c, xᵢ, neighbors)

Center the point xᵢ, as well as each of its neighboring points nⱼ ∈ neighbors, to the (precomputed) centroid c of the points {xᵢ, n₁, n₂, …, nₖ}, and store the centered vectors in the pre-allocated vector of vectors C.

ComplexityMeasures.codifyFunction
codify(o::OutcomeSpace, x::Vector) → s::Vector{Int}
codify(o::OutcomeSpace, x::AbstractStateSpaceSet{D}) → s::NTuple{D, Vector{Int}

Codify x according to the outcome space o. If x is a Vector, then a Vector{<:Integer} is returned. If x is a StateSpaceSet{D}, then symbolization is done column-wise and an NTuple{D, Vector{<:Integer}} is returned, where D = dimension(x).

Description

The reason this function exists is that we don't always want to encode the entire input x at once. Sometimes, it is desirable to first apply some transformation to x first, then apply Encodings in a point-wise manner in the transformed space. (the OutcomeSpace dictates this transformation). This is useful for encoding timeseries data.

The length of the returned s depends on the OutcomeSpace. Some outcome spaces preserve the input data length (e.g. UniqueElements), while some outcome spaces (e.g. OrdinalPatterns) do e.g. delay embeddings before encoding, so that length(s) < length(x).

ComplexityMeasures.compute_ϕMethod
compute_ϕ(x::AbstractVector{T}; r = 0.2 * Statistics.std(x), k::Int = 2,
    τ::Int = 1, base = MathConstants.e) where T <: Real

Construct the embedding

\[u = \{{\bf u}_n \}_{n = 1}^{N - k + 1} = \{[x(i), x(i + 1), \ldots, x(i + k - 1)]\}_{n = 1}^{N - k + 1}\]

and use a tree-and-nearest-neighbor search approach to compute

\[\phi^k(r) = \dfrac{1}{N - kτ + 1} \sum_{i}^{N - kτ + 1} \log_{b}{(C_i^k(r))},\]

taking logarithms to base $b$, and where

\[C_i^k(r) = \textrm{number of } j \textrm{ such that } d({\bf u}_i, {\bf u}_j) < r,\]

where $d$ is the maximum (Chebyshev) distance, r is the tolerance, and N is the length of the original scalar-valued time series x.

ComplexityMeasures.convert_logunitMethod
convert_logunit(h_a::Real, base_from, base_to) → h_b

Convert a number h_a computed with logarithms to base base_from to an entropy h_b computed with logarithms to base base_to. This can be used to convert the "unit" of an entropy.

ComplexityMeasures.countsFunction
counts(o::OutcomeSpace, x) → cts::Counts

Compute the same counts as in the counts_and_outcomes function, with two differences:

  1. Do not explicitly return the outcomes.
  2. If the outcomes are not estimated for free while estimating the counts, a special integer type is used to enumerate the outcomes, to avoid the computational cost of estimating the outcomes.
ComplexityMeasures.counts_and_outcomesMethod
counts_and_outcomes(o::OutcomeSpace, x) → (cts::Counts, Ω)

Discretize/encode x (which must be sortable) into a finite set of outcomes Ω specified by the provided OutcomeSpace o, and then count how often each outcome Ωᵢ ∈ Ω (i.e. each "discretized value", or "encoded symbol") appears.

Return a tuple where the first element is a Counts instance, which is vector-like and contains the counts, and where the second element Ω are the outcomes corresponding to the counts, such that cts[i] is the count for the outcome Ω[i].

The outcomes are actually included in cts, and you can use the outcomes function on the cts to get them. counts_and_outcomes returns both for backwards compatibility.

counts_and_outcomes(x) → cts::Counts

If no OutcomeSpace is specified, then UniqueElements is used as the outcome space.

Description

For OutcomeSpaces that uses encode to discretize, it is possible to count how often each outcome $\omega_i \in \Omega$, where $\Omega$ is the set of possible outcomes, is observed in the discretized/encoded input data. Thus, we can assign to each outcome $\omega_i$ a count $f(\omega_i)$, such that $\sum_{i=1}^N f(\omega_i) = N$, where $N$ is the number of observations in the (encoded) input data. counts returns the counts $f(\omega_i)_{obs}$ and outcomes only for the observed outcomes $\omega_i^{obs}$ (those outcomes that actually appear in the input data). If you need the counts for unobserved outcomes as well, use allcounts_and_outcomes.

ComplexityMeasures.decodeFunction
decode(c::Encoding, i::Integer) -> ω

Decode an encoded element i into the outcome ω ∈ Ω it corresponds to. Ω is the outcome_space that uses encoding c.

ComplexityMeasures.distance_to_whitenoiseMethod
distance_to_whitenoise(estimator::ReverseDispersion, p::Probabilities;
    normalize = false)

Compute the distance of the probability distribution p from a uniform distribution, given the parameters of estimator (which must be known beforehand).

If normalize == true, then normalize the value to the interval [0, 1] by using the parameters of estimator.

Used to compute reverse dispersion entropy(ReverseDispersion; Li et al., 2019Li2019).

ComplexityMeasures.encodeFunction
encode(c::Encoding, χ) -> i::Int

Encode an element χ ∈ x of input data x (those given to e.g., counts) into the positive integers using encoding c. The special value of i = -1 is used as a return value for inappropriate elements χ that cannot be encoded according to c.

ComplexityMeasures.entropyMethod
entropy(args...)

entropy is nothing more than a call to information that will simply throw an error if used with an information measure that is not an entropy.

ComplexityMeasures.entropy_approxMethod
entropy_approx(x; m = 2, τ = 1, r = 0.2 * Statistics.std(x), base = MathConstants.e)

Convenience syntax for computing the approximate entropy (Pincus, 1991) for timeseries x.

This is just a wrapper for complexity(ApproximateEntropy(; m, τ, r, base), x) (see also ApproximateEntropy).

ComplexityMeasures.entropy_complexity_curvesMethod
entropy_complexity_curves(c::StatisticalComplexity;
    num_max=1, num_min=1000) -> (min_entropy_complexity, max_entropy_complexity)

Calculate the maximum complexity-entropy curve for the statistical complexity according to Rosso2007 for num_max * total_outcomes(c.o) different values of the normalized information measure of choice (in case of the maximum complexity curves) and num_min different values of the normalized information measure of choice (in case of the minimum complexity curve).

This function can also be used to compute the maximum "complexity-extropy curve" if c.hest is e.g. ShannonExtropy, which is the equivalent of the complexity-entropy curves, but using extropy instead of entropy.

Description

The way the statistical complexity is designed, there is a minimum and maximum possible complexity for data with a given value of an information measure. The calculation time of the maximum complexity curve grows as O(total_outcomes(c.o)^2), and thus takes very long for high numbers of outcomes. This function is inspired by S. Sippels implementation in statcomp Sippel2016.

This function will work with any ProbabilitiesEstimator where total_outcomes is known a priori.

ComplexityMeasures.entropy_dispersionMethod
entropy_dispersion(x; base = 2, kwargs...)

Compute the dispersion entropy. This function is just a convenience call to:

est = Dispersion(kwargs...)
information(Shannon(base), est, x)

See Dispersion for more info.

ComplexityMeasures.entropy_distributionMethod
entropy_distribution(x; τ = 1, m = 3, n = 3, base = 2)

Compute the distribution entropy Li2015 of x using embedding dimension m with delay/lag τ, using the Chebyshev distance metric, and using an n-element equally-spaced binning over the distribution of distances to estimate probabilities.

This function is just a convenience call to:

x = rand(1000000)
o = SequentialPairDistances(x, n, m, τ, metric = Chebyshev())
h = information(Shannon(base = 2), o, x)

See SequentialPairDistances for more info.

ComplexityMeasures.entropy_waveletMethod
entropy_wavelet(x; wavelet = Wavelets.WT.Daubechies{12}(), base = 2)

Compute the wavelet entropy. This function is just a convenience call to:

est = WaveletOverlap(wavelet)
information(Shannon(base), est, x)

See WaveletOverlap for more info.

ComplexityMeasures.fasthist!Method
fasthist!(x) → c::Vector{Int}

Count the occurrences c of the unique data values in x, so that c[i] is the number of times the value sort!(unique(x))[i] occurs. Hence, this method is useful mostly when x contains integer or categorical data.

Prior to counting, x is sorted, so this function also mutates x. Therefore, it is called with copy in higher level API when necessary. This function works for any x for which sort!(x) works.

ComplexityMeasures.fasthist!Method
fasthist!(x, weights) → c::Vector{Real}

Similar to fasthist!(x), but here the weights are summed up for each unique entry of x. x is sorted just like in fasthist!(x).

ComplexityMeasures.fasthistMethod
fasthist(c::RectangularBinEncoding, x::Vector_or_SSSet)

Intermediate method that runs fasthist! in the encoded space and returns the encoded space histogram (counts) and corresponding bins. Also skips any instances of out-of-bound points for the histogram.

ComplexityMeasures.informationMethod
information(est::DifferentialInfoEstimator, x) → h::Real

Estimate a differential information measure using the provided DifferentialInfoEstimator and input data x.

Description

The overwhelming majority of differential estimators estimate the Shannon entropy. If the same estimator can estimate different information measures (e.g. it can estimate both Shannon and Tsallis), then the information measure is provided as an argument to the estimator itself.

See the table of differential information measure estimators in the docs for all differential information measure estimators.

Currently, unlike for the discrete information measures, this method doesn't involve explicitly first computing a probability density function and then passing this density to an information measure definition. But in the future, we want to establish a density API similar to the probabilities API.

Examples

To compute the differential version of a measure, give it as the first argument to a DifferentialInfoEstimator and pass it to information.

x = randn(1000)
h_sh = information(Kraskov(Shannon()), x)
h_vc = information(Vasicek(Shannon()), x)

A normal distribution has a base-e Shannon differential entropy of 0.5*log(2π) + 0.5 nats.

est = Kraskov(k = 5, base = ℯ) # Base `ℯ` for nats.
h = information(est, randn(2_000_000))
abs(h - 0.5*log(2π) - 0.5) # ≈ 0.0001
ComplexityMeasures.informationMethod
information([die::DiscreteInfoEstimator,] [est::ProbabilitiesEstimator,] o::OutcomeSpace, x) → h::Real
information(o::OutcomeSpace, x) → h::Real

Estimate a discrete information measure from input data x using the provided DiscreteInfoEstimator and ProbabilitiesEstimator over the given OutcomeSpace.

As an alternative, you can provide an InformationMeasure for the first argument (die) which will default to PlugIn estimation) for the information estimation. You may also skip the first argument (die), in which case Shannon() will be used. You may also skip the second argument (est), which will default to the RelativeAmount probabilities estimator. Note that some information measure estimators (e.g., GeneralizedSchuermann) operate directly on counts and hence ignore est.

information([e::DiscreteInfoEstimator,] p::Probabilities) → h::Real
information([e::DiscreteInfoEstimator,] c::Counts) → h::Real

Like above, but estimate the information measure from the pre-computed Probabilities p or Counts. Counts are converted into probabilities using RelativeAmount, unless the estimator e uses counts directly.

See also: information_maximum, information_normalized for a normalized version.

Examples (naive estimation)

The simplest way to estimate a discrete measure is to provide the InformationMeasure directly in combination with an OutcomeSpace. This will use the "naive" PlugIn estimator for the measure, and the "naive" RelativeAmount estimator for the probabilities.

x = randn(100) # some input data
o = ValueBinning(RectangularBinning(5)) # a 5-bin histogram outcome space
h_s = information(Shannon(), o, x)

Here are some more examples:

x = [rand(Bool) for _ in 1:10000] # coin toss
ps = probabilities(x) # gives about [0.5, 0.5] by definition
h = information(ps) # gives 1, about 1 bit by definition (Shannon entropy by default)
h = information(Shannon(), ps) # syntactically equivalent to the above
h = information(Shannon(), UniqueElements(), x) # syntactically equivalent to above
h = information(Renyi(2.0), ps) # also gives 1, order `q` doesn't matter for coin toss
h = information(OrdinalPatterns(;m=3), x) # gives about 2, again by definition

Examples (bias-corrected estimation)

It is known that both PlugIn estimation for information measures and RelativeAmount estimation for probabilities are biased. The scientific literature abounds with estimators that correct for this bias, both on the measure-estimation level and on the probability-estimation level. We thus provide the option to use any DiscreteInfoEstimator in combination with any ProbabilitiesEstimator for improved estimates. Note that custom probabilites estimators will only work with counting-compatible OutcomeSpace.

x = randn(100)
o = ValueBinning(RectangularBinning(5))

# Estimate Shannon entropy estimation using various dedicated estimators
h_s = information(MillerMadow(Shannon()), RelativeAmount(), o, x)
h_s = information(HorvitzThompson(Shannon()), Shrinkage(), o, x)
h_s = information(Schuermann(Shannon()), Shrinkage(), o, x)

# Estimate information measures using the generic `Jackknife` estimator
h_r = information(Jackknife(Renyi()), Shrinkage(), o, x)
j_t = information(Jackknife(TsallisExtropy()), BayesianRegularization(), o, x)
j_r = information(Jackknife(RenyiExtropy()), RelativeAmount(), o, x)
ComplexityMeasures.information_maximumMethod
information_maximum(e::InformationMeasure, o::OutcomeSpace [, x])

Return the maximum value of the given information measure can have, given input data x and the given outcome space (the OutcomeSpace may also be specified by a ProbabilitiesEstimator).

Like in outcome_space, for some outcome spaces, the possible outcomes are known without knowledge of input x, in which case the function dispatches to information_maximum(e, o).

information_maximum(e::InformationMeasure, L::Int)

The same as above, but computed directly from the number of total outcomes L.

ComplexityMeasures.information_normalizedMethod
information_normalized([e::DiscreteInfoEstimator,] [est::ProbabilitiesEstimator,] o::OutcomeSpace, x) → h::Real

Estimate the normalized version of the given discrete information measure, This is just the value of information divided its maximum possible value given o.

The same convenience syntaxes as in information can be used here.

Notice that there is no method information_normalized(e::DiscreteInfoEstimator, probs::Probabilities), because there is no way to know the number of possible outcomes (i.e., the total_outcomes) from probs.

Normalized values

For the PlugIn estimator, it is guaranteed that h̃ ∈ [0, 1]. For any other estimator, we can't guarantee this, since the estimator might over-correct. You should know what you're doing if using anything but PlugIn to estimate normalized values.

ComplexityMeasures.invariantmeasureMethod
invariantmeasure(x::AbstractStateSpaceSet, binning::RectangularBinning;
    rng = Random.default_rng()) → iv::InvariantMeasure

Estimate an invariant measure over the points in x based on binning the data into rectangular boxes dictated by the binning, then approximate the transfer (Perron-Frobenius) operator over the bins. From the approximation to the transfer operator, compute an invariant distribution over the bins. Assumes that the input data are sequential.

Details on the estimation procedure is found the TransferOperator docstring.

Example

using DynamicalSystems
henon_rule(x, p, n) = SVector{2}(1.0 - p[1]*x[1]^2 + x[2], p[2]*x[1])
henon = DeterministicIteratedMap(henon_rule, zeros(2), [1.4, 0.3])
orbit, t = trajectory(ds, 20_000; Ttr = 10)

# Estimate the invariant measure over some coarse graining of the orbit.
iv = invariantmeasure(orbit, RectangularBinning(15))

# Get the probabilities and bins
invariantmeasure(iv)

Probabilities and bin information

invariantmeasure(iv::InvariantMeasure) → (ρ::Probabilities, bins::Vector{<:SVector})

From a pre-computed invariant measure, return the probabilities and associated bins. The element ρ[i] is the probability of visitation to the box bins[i].

Transfer operator approach vs. naive histogram approach

Why bother with the transfer operator instead of using regular histograms to obtain probabilities?

In fact, the naive histogram approach and the transfer operator approach are equivalent in the limit of long enough time series (as $n \to \intfy$), which is guaranteed by the ergodic theorem. There is a crucial difference, however:

The naive histogram approach only gives the long-term probabilities that orbits visit a certain region of the state space. The transfer operator encodes that information too, but comes with the added benefit of knowing the transition probabilities between states (see transfermatrix).

See also: InvariantMeasure.

ComplexityMeasures.ith_order_statisticFunction
ith_order_statistic(ex, i::Int, n::Int = length(x))

Return the i-th order statistic from the order statistics ex, requiring that Xᵢ = X₁ if i < 1 and Xᵢ = Xₙ if i > n.

ComplexityMeasures.log_with_baseMethod
log_with_base(base) → f

Return a function that computes the logarithm at a given base. This definitely increases accuracy, and probably also performance.

ComplexityMeasures.maxdistsMethod
maxdists(xᵢ, nns) → dists

Compute the maximum distance from xᵢ to the points xⱼ ∈ nns along each dimension, i.e. dists[k] = max{xᵢ[k], xⱼ[k]} for j = 1, 2, ..., length(x).

ComplexityMeasures.multiscaleFunction
multiscale(algorithm::MultiScaleAlgorithm, [args...], x)

A convenience function to compute the multiscale version of any InformationMeasureEstimator or ComplexityEstimator

The return type of multiscale is either a Vector{Real} or a Vector{Vector{Real}}, see the available coarse-graining methods below.

It utilizes downsample with the given algorithm to first produce coarse-grained, downsampled versions of x for scale factors algorithm.scales. Then, information or complexity, depending on the input arguments, is applied to each of the coarse-grained timeseries. If N = length(x), then the length of the most severely downsampled version of x is N ÷ maximum(algorithm.scales), while for scale factor 1, the original time series is considered.

Description

This function generalizes the multiscale entropy of Costa2002 to any discrete information measure, any differential information measure, and any other complexity measure.

Coarse-graining algorithms

The available downsampling routines are:

Examples

multiscale can be used with any discrete or differential information measure estimator. For example, here's two ways of computing multiscale Tsallis entropy:

using ComplexityMeasures
x = randn(1000)
downsampling = RegularDownsampling(scales = 1:5) # multiscale algorithm

# Symbolic (ordinal-pattern-based) probabilities estimation using Bayesian regularization,
# jackknife estimation of the entropy.
o = OrdinalPatterns{3}(2) # outcome space
probest = BayesianRegularization() # probabilities estimator
hest = Jackknife(Tsallis(q = 1.5)) # entropy estimator
multiscale(downsampling, hest, probest, o, x)

# Differential kNN-based estimator:
hest = LeonenkoProzantoSavani(Tsallis(q = 1.5), k = 10) # 10 neighbors
multiscale(downsampling, hest, x)

Multiscale variants of any ComplexityEstimator are also trivial to compute. Let's compute the "generalized multiscale sample entropy Costa2015" using the second-order moment.

using ComplexityMeasures, Statistics
multiscale(CompositeDownsampling(; f = Statistics.var), SampleEntropy(x), x)
ComplexityMeasures.n_borderpointsMethod
n_borderpoints(xᵢ, nns, dists) → ξ

Compute ξ, which is how many of xᵢ's neighbor points xⱼ ∈ nns fall on the border of the minimal-volume rectangle with xᵢ at its center.

dists[k] should be the maximum distance from xᵢ[k] to any other point along the k-th dimension, and length(dists) is the total dimension.

ComplexityMeasures.outcome_spaceMethod
outcome_space(o::OutcomeSpace, x) → Ω

Return a sorted container containing all possible outcomes of o for input x.

For some estimators the concrete outcome space is known without knowledge of input x, in which case the function dispatches to outcome_space(o). In general it is recommended to use the 2-argument version irrespectively of estimator.

ComplexityMeasures.outcomesMethod
outcomes(o::OutcomeSpace, x)

Return all (unique) outcomes that appear in the (encoded) input data x, according to the given OutcomeSpace. Equivalent to probabilities_and_outcomes(o, x)[2], but for some estimators it may be explicitly extended for better performance.

ComplexityMeasures.print_array_with_marginsMethod
print_array_with_margins(io::IO, x::AbstractArray{T, 1}, margin::AbstractVector) where T

Prints a length-N vector x alongside the margin elements, such that x[i] corresponds to the marginal element margin[i].

ComplexityMeasures.probabilitiesFunction
probabilities(
    [est::ProbabilitiesEstimator], o::OutcomeSpace, x::Array_or_SSSet
) → p::Probabilities

Compute the same probabilities as in the probabilities_and_outcomes function, with two differences:

  1. Do not explicitly return the outcomes.
  2. If the outcomes are not estimated for free while estimating the counts, a special integer type is used to enumerate the outcomes, to avoid the computational cost of estimating the outcomes.
probabilities([est::ProbabilitiesEstimator], counts::Counts) → (p::Probabilities, Ω)

The same as above, but estimate the probability directly from a set of Counts.

ComplexityMeasures.probabilities_and_outcomesFunction
probabilities_and_outcomes(
    [est::ProbabilitiesEstimator], o::OutcomeSpace, x::Array_or_SSSet
) → (p::Probabilities, Ω)

Estimate a probability distribution over the set of possible outcomes Ω defined by the OutcomeSpace o, given input data x. Probabilities are estimated according to the given probabilities estimator est, which defaults to RelativeAmount.

The input data is typically an Array or a StateSpaceSet (or SSSet for short); see Input data for ComplexityMeasures.jl. Configuration options are always given as arguments to the chosen outcome space and probabilities estimator.

Return a tuple where the first element is a Probabilities instance, which is vector-like and contains the probabilities, and where the second element Ω are the outcomes corresponding to the probabilities, such that p[i] is the probability for the outcome Ω[i].

The outcomes are actually included in p, and you can use the outcomes function on the p to get them. probabilities_and_outcomes returns both for backwards compatibility.

probabilities_and_outcomes(
    [est::ProbabilitiesEstimator], counts::Counts
) → (p::Probabilities, Ω)

Estimate probabilities from the pre-computed counts using the given ProbabilitiesEstimator est.

Description

Probabilities are computed by:

  1. Discretizing/encoding x into a finite set of outcomes Ω specified by the provided OutcomeSpace o.
  2. Assigning to each outcome Ωᵢ ∈ Ω either a count (how often it appears among the discretized data points), or a pseudo-count (some pre-normalized probability such that sum(Ωᵢ for Ωᵢ in Ω) == 1).

For outcome spaces that result in pseudo counts, such as PowerSpectrum, these pseudo counts are simply treated as probabilities and returned directly (that is, est is ignored). For counting-based outcome spaces (see OutcomeSpace docstring), probabilities are estimated from the counts using some ProbabilitiesEstimator (first signature).

Observed vs all probabilities

Due to performance optimizations, whether the returned probabilities contain 0s as entries or not depends on the outcome space. E.g., in ValueBinning 0s are skipped, while in PowerSpectrum 0 are not skipped, because we get them for free.

Use allprobabilities_and_outcomes to guarantee that zero probabilities are also returned (may be slower).

ComplexityMeasures.relevant_fieldnamesMethod
relevant_fieldnames(x::T) → names::Vector{Symbol}

Internal method that returns the relevant field names to be printed for type T. For example, for Encodings, the implementation is simply

relevant_fieldnames(e::Encoding) = fieldnames(typeof(e))

Individual types can override this method if special printing is desired.

ComplexityMeasures.sample_entropy_probsMethod
sample_entropy_probs(x; k::Int = 2, m::Int = 2, τ::Int = 1, r = 0.2 * Statistics.std(x))

Compute the probabilities required for entropy_sample. k is the embedding dimension, τ is the embedding lag, and m is a normalization constant (so that we consider the same number of points for both the m-dimensional and the m+1-dimensional embeddings), and r is the radius.

ComplexityMeasures.special_typeparameter_infoMethod
special_typeparameter_info(::Type{T})

Returns a string containing any extra type parameter information to be printed for a given type.

Defaults to nothing, but for types like OrdinalPatterns, we'd like to include the display the type parameter m as OrdinalPatterns{m}(...).

ComplexityMeasures.stencil_lengthFunction
stencil_length(stencil::Vector{CartesianIndex{D}}) where D → length(stencil)
stencil_length(stencil::NTuple{2, NTuple{D, T}}) where {D, T} → prod(stencil[1])
stencil_length(stencil::Array{Int, D}) where D → sum(stencil)

Count the number of elements in the stencil.

ComplexityMeasures.total_outcomesMethod
total_outcomes(o::OutcomeSpace, x)

Return the length (cardinality) of the outcome space $\Omega$ of est.

For some OutcomeSpace, the cardinality is known without knowledge of input x, in which case the function dispatches to total_outcomes(est). In general it is recommended to use the 2-argument version irrespectively of estimator.

ComplexityMeasures.transfermatrixMethod
transfermatrix(iv::InvariantMeasure) → (M::AbstractArray{<:Real, 2}, bins::Vector{<:SVector})

Return the transfer matrix/operator and corresponding bins. Here, bins[i] corresponds to the i-th row/column of the transfer matrix. Thus, the entry M[i, j] is the probability of jumping from the state defined by bins[i] to the state defined by bins[j].

See also: TransferOperator.

ComplexityMeasures.transferoperatorMethod
transferoperator(pts::AbstractStateSpaceSet,
    binning::RectangularBinning) → TransferOperatorApproximationRectangular

Estimate the transfer operator given a set of sequentially ordered points subject to a rectangular partition given by the binning.

ComplexityMeasures.type_printcolorMethod
type_printcolor(x) → color::Symbol

The color in which to print the type name for type typeof(x).

Can be used to distinguish different types, so that nested printing looks better (e.g. Encodings inside OutcomeSpaces).

ComplexityMeasures.volume_minimal_rectMethod
volume_minimal_rect(xᵢ, nns) → vol

Compute the volume of the minimal enclosing rectangle with xᵢ at its center and containing all points nᵢ ∈ nns either within the rectangle or on one of its borders.

This function respects the coordinate system of the input data, i.e. it does not perform any rotation (which would be computationally more demanding because we'd need to find the convex hull of nns, but could potentially give more accurate results).

ComplexityMeasures.volume_minimal_rectMethod
volume_minimal_rect(dists) → vol

Compute the volume of a (hyper)-rectangle where the distance from its centre along the k-th dimension is given by dists[k], and length(dists) is the total dimension.

ComplexityMeasures.GroupSlices.firstindsMethod
firstinds(ic::Vector{Int})
firstinds(ib::Vector{Vector{Int}})

Returns a vector of integers containing the first index position of each unique value in the input integer vector ic, or the first index position of each entry in the input vector of integer vectors ib. When operating on the output returned from unique(A, dim), the returned vector of integers correspond to the positions of the first of each unique slice present in the original input multidimensional array A along dimension dim. The implementation of firstinds accepting a vector of integers operates on the output returned from groupslices(A, dim). The implementation of firstinds accepting a vector of vector of integers operates on the output returned from groupinds(ic::Vector{Int}).

ComplexityMeasures.GroupSlices.groupindsMethod
groupinds(ic)

Returns a vector of vectors of integers wherein the vector of group slice index integers as returned from groupslices(A, dim) is converted into a grouped vector of vectors. Each vector entry in the returned vector of vectors contains all of the positional indices of slices in the original input array A that correspond to the unique slices along dimension dim that are present in the array C as returned from unique(A, dim).

ComplexityMeasures.GroupSlices.groupslicesMethod
groupslices(V::AbstractVector)

Returns a vector of integers the same length as V, which in place of each entry x has the index of the first entry of V which is equal to x. This is true:

all(x == V[i] for (x,i) in zip(V, groupslices(V)))
ComplexityMeasures.GroupSlices.groupslicesMethod
groupslices(A; dims) = groupslices(A, dims)

Returns a vector of integers where each integer element of the returned vector is a group number corresponding to the unique slices along dimension dims as returned from unique(A; dims=d), where A can be a multidimensional array. The default is dims = 1. Example usage: If C = unique(A; dims=dims), ic = groupslices(A, dims), and ndims(A) == ndims(C) == 3, then:

if dims == 1
   all(A .== C[ic,:,:])
elseif dims == 2
   all(A .== C[:,ic,:])
elseif dims == 3
   all(A .== C[:,:,ic])
end
ComplexityMeasures.GroupSlices.lastindsMethod
lastinds(ic::Vector{Int})

Returns a vector of integers containing the last index position of each unique value in the input integer vector ic. When operating on the output returned from groupinds(unique(A, dim)), the returned vector of integers correspond to the positions of the last of each unique slice present in the original input multidimensional array A along dimension dim. The implementation of firstinds accepting a vector of vector of integers operates on the output returned from groupinds(ic::Vector{Int}).