`ComplexityMeasures.ComplexityMeasures`

— Module**ComplexityMeasures.jl**

ComplexityMeasures.jl is a Julia-based software for calculating 1000s of various kinds of probabilities, entropies, and other so-called *complexity measures* from a single-variable input datasets. For relational measures across many input datasets see its extension CausalityTools.jl. If you are a user of other programming languages (Python, R, MATLAB, ...), you can still use ComplexityMeasures.jl due to Julia's interoperability. For example, for Python use `juliacall`

.

A careful comparison with alternative widely used software shows that ComplexityMeasures.jl outclasses the alternatives in several objective aspects of comparison, such as computational performance, overall amount of measures, reliability, and extendability. See the associated publication for more details.

The key features that it provides can be summarized as:

- A rigorous framework for extracting probabilities from data, based on the mathematical formulation of probability spaces.
- Several (12+) outcome spaces, i.e., ways to discretize data into probabilities.
- Several estimators for estimating probabilities given an outcome space, which correct theoretically known estimation biases.
- Several definitions of information measures, such as various flavours of entropies (Shannon, Tsallis, Curado...), extropies, and other complexity measures, that are used in the context of nonlinear dynamics, nonlinear timeseries analysis, and complex systems.
- Several discrete and continuous (differential) estimators for entropies, which correct theoretically known estimation biases.
- An extendable interface and well thought out API accompanied by dedicated developer documentation. This makes it trivial to define new outcome spaces, or new estimators for probabilities, information measures, or complexity measures and integrate them with everything else in the software without boilerplate code.

ComplexityMeasures.jl can be used as a standalone package, or as part of other projects in the JuliaDynamics organization, such as DynamicalSystems.jl or CausalityTools.jl.

To install it, run `import Pkg; Pkg.add("ComplexityMeasures")`

.

All further information is provided in the documentation, which you can either find online or build locally by running the `docs/make.jl`

file.

*Previously, this package was called Entropies.jl.*

`ComplexityMeasures.our_abstract_types`

— Constant`const our_abstract_types`

The types from out package which we want to pretty-print.

`ComplexityMeasures.AbstractBinning`

— Type`AbstractBinning`

Supertype encompassing `RectangularBinning`

and `FixedRectangularBinning`

.

`ComplexityMeasures.AddConstant`

— Type```
AddConstant <: ProbabilitiesEstimator
AddConstant(; c = 1.0)
```

A generic add-constant probabilities estimator for counting-based `OutcomeSpace`

s, where several literature estimators can be obtained tuning `c`

. Currently $c$ can only be a scalar.

`c = 1.0`

is the Laplace estimator, or the "add-one" estimator.

**Description**

Probabilities for the $k$-th outcome $\omega_{k}$ are estimated as

\[p(\omega_k) = \dfrac{(n_k + c)}{n + mc},\]

where $m$ is the cardinality of the outcome space, and $n$ is the number of (encoded) input data points, and $n_k$ is the number of times the outcome $\omega_{k}$ is observed in the (encoded) input data points.

If the `AddConstant`

estimator used with `probabilities_and_outcomes`

, then $m$ is set to the number of *observed* outcomes. If used with `allprobabilities_and_outcomes`

, then $m$ is set to the number of *possible* outcomes.

Looking at the formula above, if $n_k = 0$, then unobserved outcomes are assigned a non-zero probability of $\dfrac{c}{n + mc}$. This means that if the estimator is used with `allprobabilities_and_outcomes`

, then all outcomes, even those that are not observed, are assigned non-zero probabilities. This might affect your results if using e.g. `missing_outcomes`

.

`ComplexityMeasures.AlizadehArghami`

— Type```
AlizadehArghami <: DifferentialInfoEstimator
AlizadehArghami(definition = Shannon(); m::Int = 1)
```

The `AlizadehArghami`

estimator computes the `Shannon`

differential `information`

of a timeseries using the method from Alizadeh2010, with logarithms to the `base`

specified in `definition`

.

The `AlizadehArghami`

estimator belongs to a class of differential entropy estimators based on order statistics. It only works for *timeseries* input.

**Description**

Assume we have samples $\bar{X} = \{x_1, x_2, \ldots, x_N \}$ from a continuous random variable $X \in \mathbb{R}$ with support $\mathcal{X}$ and density function$f : \mathbb{R} \to \mathbb{R}$. `AlizadehArghami`

estimates the `Shannon`

differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]

However, instead of estimating the above integral directly, it makes use of the equivalent integral, where $F$ is the distribution function for $X$:

\[H(X) = \int_0^1 \log \left(\dfrac{d}{dp}F^{-1}(p) \right) dp.\]

This integral is approximated by first computing the order statistics of $\bar{X}$ (the input timeseries), i.e. $x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)}$. The `AlizadehArghami`

`Shannon`

differential entropy estimate is then the the `Vasicek`

estimate $\hat{H}_{V}(\bar{X}, m, n)$, plus a correction factor

\[\hat{H}_{A}(\bar{X}, m, n) = \hat{H}_{V}(\bar{X}, m, n) + \dfrac{2}{n}\left(m \log(2) \right).\]

See also: `information`

, `Correa`

, `Ebrahimi`

, `Vasicek`

, `DifferentialInfoEstimator`

.

`ComplexityMeasures.AmplitudeAwareOrdinalPatterns`

— Type```
AmplitudeAwareOrdinalPatterns <: OutcomeSpace
AmplitudeAwareOrdinalPatterns{m}(τ = 1, A = 0.5, lt = ComplexityMeasures.isless_rand)
```

A variant of `OrdinalPatterns`

that also incorporates amplitude information, based on the amplitude-aware permutation entropy Azami2016. The outcome space and arguments are the same as in `OrdinalPatterns`

.

**Description**

Similarly to `WeightedOrdinalPatterns`

, a weight $w_i$ is attached to each ordinal pattern extracted from each state (or delay) vector $\mathbf{x}_i = (x_1^i, x_2^i, \ldots, x_m^i)$ as

\[w_i = \dfrac{A}{m} \sum_{k=1}^m |x_k^i | + \dfrac{1-A}{d-1} \sum_{k=2}^d |x_{k}^i - x_{k-1}^i|,\]

with $0 \leq A \leq 1$. When $A=0$ , only internal differences between the elements of $\mathbf{x}_i$ are weighted. Only mean amplitude of the state vector elements are weighted when $A=1$. With, $0<A<1$, a combined weighting is used.

`ComplexityMeasures.ApproximateEntropy`

— Type```
ApproximateEntropy <: ComplexityEstimator
ApproximateEntropy([x]; r = 0.2std(x), kwargs...)
```

An estimator for the approximate entropy Pincus1991 complexity measure, used with `complexity`

.

The keyword argument `r`

is mandatory if an input timeseries `x`

is not provided.

**Keyword arguments**

`r::Real`

: The radius used when querying for nearest neighbors around points. Its value should be determined from the input data, for example as some proportion of the standard deviation of the data.`m::Int = 2`

: The embedding dimension.`τ::Int = 1`

: The embedding lag.`base::Real = MathConstants.e`

: The base to use for the logarithm. Pincus (1991) uses the natural logarithm.

**Description**

Approximate entropy (ApEn) is defined as

\[ApEn(m ,r) = \lim_{N \to \infty} \left[ \phi(x, m, r) - \phi(x, m + 1, r) \right].\]

Approximate entropy is estimated for a timeseries `x`

, by first embedding `x`

using embedding dimension `m`

and embedding lag `τ`

, then searching for similar vectors within tolerance radius `r`

, using the estimator described below, with logarithms to the given `base`

(natural logarithm is used in Pincus, 1991).

Specifically, for a finite-length timeseries `x`

, an estimator for $ApEn(m ,r)$ is

\[ApEn(m, r, N) = \phi(x, m, r, N) - \phi(x, m + 1, r, N),\]

where `N = length(x)`

and

\[\phi(x, k, r, N) = \dfrac{1}{N-(k-1)\tau} \sum_{i=1}^{N - (k-1)\tau} \log{\left( \sum_{j = 1}^{N-(k-1)\tau} \dfrac{\theta(d({\bf x}_i^m, {\bf x}_j^m) \leq r)}{N-(k-1)\tau} \right)}.\]

Here, $\theta(\cdot)$ returns 1 if the argument is true and 0 otherwise, $d({\bf x}_i, {\bf x}_j)$ returns the Chebyshev distance between vectors ${\bf x}_i$ and ${\bf x}_j$, and the `k`

-dimensional embedding vectors are constructed from the input timeseries $x(t)$ as

\[{\bf x}_i^k = (x(i), x(i+τ), x(i+2τ), \ldots, x(i+(k-1)\tau)).\]

In the original paper, they fix `τ = 1`

. In our implementation, the normalization constant is modified to account for embeddings with `τ != 1`

.

`ComplexityMeasures.BayesianRegularization`

— Type```
BayesianRegularization <: ProbabilitiesEstimator
BayesianRegularization(; a = 1.0)
```

The `BayesianRegularization`

estimator is used with `probabilities`

and related functions to estimate probabilities an `m`

-element counting-based `OutcomeSpace`

using Bayesian regularization of cell counts Hausser2009. See `ProbabilitiesEstimator`

for usage.

**Outcome space requirements**

This estimator only works with counting-compatible outcome spaces.

**Description**

The `BayesianRegularization`

estimator estimates the probability of the $k$-th outcome $\omega_{k}$ is

\[\omega_{k}^{\text{BayesianRegularization}} = \dfrac{n_k + a_k}{n + A},\]

where $n$ is the number of samples in the input data, $n_k$ is the observed counts for the outcome $\omega_{k}$, and $A = \sum_{i=1}^k a_k$.

**Picking a**

There are many common choices of priors, some of which are listed in Hausser2009. They include

`a == 0`

, which is equivalent to the`RelativeAmount`

estimator.`a == 0.5`

(Jeffrey's prior)`a == 1`

(Bayes-Laplace uniform prior)

`a`

can also be chosen as a vector of real numbers. Then, if used with `allprobabilities_and_outcomes`

, it is required that `length(a) == total_outcomes(o, x)`

, where `x`

is the input data and `o`

is the `OutcomeSpace`

. If used with `probabilities`

, then `length(a)`

must match the number of *observed* outcomes (you can check this using `probabilities_and_outcomes`

). The choice of `a`

can severely impact the estimation errors of the probabilities, and the errors depend both on the choice of `a`

and on the sampling scenario Hausser2009.

**Assumptions**

The `BayesianRegularization`

estimator assumes a fixed and known `m`

. Thus, using it with `probabilities_and_outcomes`

and `allprobabilities_and_outcomes`

will yield different results, depending on whether all outcomes are observed in the input data or not. For `probabilities_and_outcomes`

, `m`

is the number of *observed* outcomes. For `allprobabilities_and_outcomes`

, `m = total_outcomes(o, x)`

, where `o`

is the `OutcomeSpace`

and `x`

is the input data.

If used with `allprobabilities_and_outcomes`

, then outcomes which have not been observed may be assigned non-zero probabilities. This might affect your results if using e.g. `missing_outcomes`

.

**Examples**

```
using ComplexityMeasures
x = cumsum(randn(100))
ps_bayes = probabilities(BayesianRegularization(a = 0.5), OrdinalPatterns{3}(), x)
```

See also: `RelativeAmount`

, `Shrinkage`

.

`ComplexityMeasures.BubbleEntropy`

— Type```
BubbleEntropy <: ComplexityEstimator
BubbleEntropy(; m = 3, τ = 1, definition = Renyi(q = 2))
```

The `BubbleEntropy`

complexity estimator Manis2017 is just a difference between two entropies, each computed with the `BubbleSortSwaps`

outcome space, for embedding dimensions `m + 1`

and `m`

, respectively.

Manis2017 use the `Renyi`

entropy of order `q = 2`

as the information measure `definition`

, but here you can use any `InformationMeasure`

. Manis2017 formulates the "bubble entropy" as the normalized measure below, while here you can also compute the unnormalized measure.

**Definition**

For input data `x`

, the "bubble entropy" is computed by first embedding the input data using embedding dimension `m`

and embedding delay `τ`

(call the embedded pts `y`

), and then computing the difference between the two entropies:

\[BubbleEn_T(τ) = H_T(y, m + 1) - H_T(y, m)\]

where $H_T(y, m)$ and $H_T(y, m + 1)$ are entropies of type $T$ (e.g. `Renyi`

) computed with the input data `x`

embedded to dimension $m$ and $m+1$, respectively. Use `complexity`

to compute this non-normalized version. Use `complexity_normalized`

to compute the normalized difference of entropies:

\[BubbleEn_H(τ)^{norm} = \dfrac{H_T(x, m + 1) - H_T(x, m)}{max(H_T(x, m + 1)) - max(H_T(x, m))},\]

where the maximum of the entropies for dimensions `m`

and `m + 1`

are computed using `information_maximum`

.

**Example**

```
using ComplexityMeasures
x = rand(1000)
est = BubbleEntropy(m = 5, τ = 3)
complexity(est, x)
```

`ComplexityMeasures.BubbleSortSwaps`

— Type```
BubbleSortSwaps <: CountBasedOutcomeSpace
BubbleSortSwaps(; m = 3, τ = 1)
```

The `BubbleSortSwaps`

outcome space is based on Manis2017's paper on "bubble entropy".

**Description**

`BubbleSortSwaps`

does the following:

- Embeds the input data using embedding dimension
`m`

and embedding lag`τ`

- For each state vector in the embedding, counting how many swaps are necessary for the bubble sort algorithm to sort state vectors.

For `counts_and_outcomes`

, we then define a distribution over the number of necessary swaps. This distribution can then be used to estimate probabilities using `probabilities_and_outcomes`

, which again can be used to estimate any `InformationMeasure`

. An example of how to compute the "Shannon bubble entropy" is given below.

**Outcome space**

The `outcome_space`

for `BubbleSortSwaps`

are the integers `0:N`

, where `N = (m * (m - 1)) / 2 + 1`

(the worst-case number of swaps). Hence, the number of `total_outcomes`

is `N + 1`

.

**Implements**

`codify`

. Returns the number of swaps required for each embedded state vector.

**Examples**

With the `BubbleSortSwaps`

outcome space, we can easily compute a "bubble entropy" inspired by Manis2017. Note: this is not actually a new entropy - it is just a new way of discretizing the input data. To reproduce the bubble entropy complexity measure from Manis2017, see `BubbleEntropy`

.

**Examples**

```
using ComplexityMeasures
x = rand(100000)
o = BubbleSortSwaps(; m = 5) # 5-dimensional embedding vectors
information(Shannon(; base = 2), o, x)
# We can also compute any other "bubble quantity", for example the
# "Tsallis bubble extropy", with arbitrary probabilities estimators:
information(TsallisExtropy(), BayesianRegularization(), o, x)
```

`ComplexityMeasures.BubbleSortSwapsEncoding`

— Type```
BubbleSortSwapsEncoding <: Encoding
BubbleSortSwapsEncoding{m}()
```

`BubbleSortSwapsEncoding`

is used with `encode`

to encode a length-`m`

input vector `x`

into an integer in the range `ω ∈ 0:((m*(m-1)) ÷ 2)`

, by counting the number of swaps required for the bubble sort algorithm to sort `x`

in ascending order.

`decode`

is not implemented for this encoding.

**Example**

```
using ComplexityMeasures
x = [1, 5, 3, 1, 2]
e = BubbleSortSwapsEncoding{5}() # constructor type argument must match length of vector
encode(e, x)
```

`ComplexityMeasures.ChaoShen`

— Type```
ChaoShen <: DiscreteInfoEstimatorShannon
ChaoShen(definition::Shannon = Shannon())
```

The `ChaoShen`

estimator is used with `information`

to compute the discrete `Shannon`

entropy according to Chao2003.

**Description**

This estimator is a modification of the `HorvitzThompson`

estimator that multiplies each plugin probability estimate by an estimate of sample coverage. If $f_1$ is the number of singletons (outcomes that occur only once) in a sample of length $N$, then the sample coverage is $C = 1 - \dfrac{f_1}{N}$. The Chao-Shen estimator of Shannon entropy is then

\[H_S^{CS} = -\sum_{i=1}^M \left( \dfrac{C p_i \log(C p_i)}{1 - (1 - C p_i)^N} \right),\]

where $N$ is the sample size and $M$ is the number of `outcomes`

. If $f_1 = N$, then $f_1$ is set to $f_1 = N - 1$ to ensure positive entropy Arora2022.

`ComplexityMeasures.CombinationEncoding`

— Type```
CombinationEncoding <: Encoding
CombinationEncoding(encodings)
```

A `CombinationEncoding`

takes multiple `Encoding`

s and creates a combined encoding that can be used to encode inputs that are compatible with the given `encodings`

.

**Encoding/decoding**

When used with `encode`

, each `Encoding`

in `encodings`

returns integers in the set `1, 2, …, n_e`

, where `n_e`

is the total number of outcomes for a particular encoding. For `k`

different encodings, we can thus construct the cartesian coordinate `(c₁, c₂, …, cₖ)`

(`cᵢ ∈ 1, 2, …, n_i`

), which can uniquely be identified by an integer. We can thus identify each unique *combined* encoding with a single integer.

When used with `decode`

, the integer symbol is converted to its corresponding cartesian coordinate, which is used to retrieve the decoded symbols for each of the encodings, and a tuple of the decoded symbols are returned.

The total number of outcomes is `prod(total_outcomes(e) for e in encodings)`

.

**Examples**

```
using ComplexityMeasures
# We want to encode the vector `x`.
x = [0.9, 0.2, 0.3]
# To do so, we will use a combination of first-difference encoding, amplitude encoding,
# and ordinal pattern encoding.
encodings = (
RelativeFirstDifferenceEncoding(0, 1; n = 2),
RelativeMeanEncoding(0, 1; n = 5),
OrdinalPatternEncoding(3) # x is a three-element vector
)
c = CombinationEncoding(encodings)
# Encode `x` as integer
ω = encode(c, x)
# Decode symbol (into a vector of decodings, one for each encodings `e ∈ encodings`).
# In this particular case, the first two element will be left-bin edges, and
# the last element will be the decoded ordinal pattern (indices that would sort `x`).
d = decode(c, ω)
```

`ComplexityMeasures.ComplexityEstimator`

— Type`ComplexityEstimator`

Supertype for estimators for various complexity measures that are not entropies in the strict mathematical sense.

See `complexity`

for all available estimators.

`ComplexityMeasures.CompositeDownsampling`

— Type```
CompositeDownsampling <: MultiScaleAlgorithm
CompositeDownsampling(; f::Function = Statistics.mean, scales = 1:8)
```

Composite multi-scale algorithm for multiscale entropy analysis Wu2013, used with `multiscale`

to compute, for example, composite multiscale entropy (CMSE).

**Description**

Given a scalar-valued input time series `x`

, the composite multiscale algorithm, like `RegularDownsampling`

, downsamples and coarse-grains `x`

by splitting it into non-overlapping windows of length `s`

, and then constructing downsampled time series by applying the function `f`

to each of the resulting length-`s`

windows.

However, Wu2013 realized that for each scale `s`

, there are actually `s`

different ways of selecting windows, depending on where indexing starts/ends. These `s`

different downsampled time series `D_t(s, f)`

at each scale `s`

are constructed as follows:

\[\{ D_{k}(s) \} = \{ D_{t, k}(s) \}_{t = 1}^{L}, = \{ f \left( \bf x_{t, k} \right) \} = \left\{ {f\left( (x_i)_{i = (t - 1)s + k}^{ts + k - 1} \right)} \right\}_{t = 1}^{L},\]

where `L = floor((N - s + 1) / s)`

and `1 ≤ k ≤ s`

, such that $D_{i, k}(s)$ is the `i`

-th element of the `k`

-th downsampled time series at scale `s`

.

Finally, compute $\dfrac{1}{s} \sum_{k = 1}^s g(D_{k}(s))$, where `g`

is some summary function, for example `information`

or `complexity`

.

**Keyword Arguments**

. The downsampling levels. If`scales`

`scales`

is set to an integer, then this integer is taken as maximum number of scales (i.e. levels of downsampling), and downsampling is done over levels`1:scales`

. Otherwise, downsampling is done over the provided`scales`

(which may be a range, or some specific scales (e.g.`scales = [1, 5, 6]`

). The maximum scale level is`length(x) ÷ 2`

, but to avoid applying the method to time series that are extremely short, consider limiting the maximum scale (e.g.`scales = length(x) ÷ 5`

).

The downsampled time series $D_{t, 1}(s)$ constructed using the composite multiscale method is equivalent to the downsampled time series $D_{t}(s)$ constructed using the `RegularDownsampling`

method, for which `k == 1`

is fixed, such that only a single time series is returned.

See also: `RegularDownsampling`

.

`ComplexityMeasures.Correa`

— Type```
Correa <: DifferentialInfoEstimator
Correa(definition = Shannon(); m::Int = 1)
```

The `Correa`

estimator computes the `Shannon`

differential `information`

of a timeseries using the method from Correa1995, with logarithms to the `base`

specified in `definition`

.

The `Correa`

estimator belongs to a class of differential entropy estimators based on order statistics. It only works for *timeseries* input.

**Description**

Assume we have samples $\bar{X} = \{x_1, x_2, \ldots, x_N \}$ from a continuous random variable $X \in \mathbb{R}$ with support $\mathcal{X}$ and density function$f : \mathbb{R} \to \mathbb{R}$. `Correa`

estimates the `Shannon`

differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]

However, instead of estimating the above integral directly, `Correa`

makes use of the equivalent integral, where $F$ is the distribution function for $X$,

\[H(X) = \int_0^1 \log \left(\dfrac{d}{dp}F^{-1}(p) \right) dp\]

This integral is approximated by first computing the order statistics of $\bar{X}$ (the input timeseries), i.e. $x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)}$, ensuring that end points are included. The `Correa`

estimate of `Shannon`

differential entropy is then

\[H_C(\bar{X}, m, n) = \dfrac{1}{n} \sum_{i = 1}^n \log \left[ \dfrac{ \sum_{j=i-m}^{i+m}(\bar{X}_{(j)} - \tilde{X}_{(i)})(j - i)}{n \sum_{j=i-m}^{i+m} (\bar{X}_{(j)} - \tilde{X}_{(i)})^2} \right],\]

where

\[\tilde{X}_{(i)} = \dfrac{1}{2m + 1} \sum_{j = i - m}^{i + m} X_{(j)}.\]

See also: `information`

, `AlizadehArghami`

, `Ebrahimi`

, `Vasicek`

, `DifferentialInfoEstimator`

.

`ComplexityMeasures.CosineSimilarityBinning`

— Type`CosineSimilarityBinning(; m::Int, τ::Int, nbins::Int)`

A `OutcomeSpace`

based on the cosine similarity Wang2020.

It can be used with `information`

to compute the "diversity entropy" of an input timeseries Wang2020.

The implementation here allows for `τ != 1`

, which was not considered in the original paper.

**Description**

CosineSimilarityBinning probabilities are computed as follows.

- From the input time series
`x`

, using embedding lag`τ`

and embedding dimension`m`

, construct the embedding $Y = \{\bf x_i \} = \{(x_{i}, x_{i+\tau}, x_{i+2\tau}, \ldots, x_{i+m\tau - 1}\}_{i = 1}^{N-mτ}$. - Compute $D = \{d(\bf x_t, \bf x_{t+1}) \}_{t=1}^{N-mτ-1}$, where $d(\cdot, \cdot)$ is the cosine similarity between two
`m`

-dimensional vectors in the embedding. - Divide the interval
`[-1, 1]`

into`nbins`

equally sized subintervals (including the value`+1`

). - Construct a histogram of cosine similarities $d \in D$ over those subintervals.
- Sum-normalize the histogram to obtain probabilities.

**Outcome space**

The outcome space for `CosineSimilarityBinning`

is the bins of the `[-1, 1]`

interval, and the return configuration is the same as in `ValueBinning`

(left bin edge).

**Implements**

`codify`

. Used for encoding inputs where ordering matters (e.g. time series).

`ComplexityMeasures.Counts`

— Type```
Counts <: Array{<:Integer, N}
Counts(counts [, outcomes [, dimlabels]]) → c
```

`Counts`

stores an `N`

-dimensional array of integer `counts`

corresponding to a set of `outcomes`

. This is typically called a "frequency table" or "contingency table".

If `c isa Counts`

, then `c.outcomes[i]`

is an abstract vector containing the outcomes along the `i`

-th dimension, where `c[i][j]`

is the count corresponding to the outcome `c.outcomes[i][j]`

, and `c.dimlabels[i]`

is the label of the `i`

-th dimension. Both labels and outcomes are assigned automatically if not given. `c`

itself can be manipulated and iterated over like its stored array.

`ComplexityMeasures.Curado`

— Type```
Curado <: InformationMeasure
Curado(; b = 1.0)
```

The Curado entropy Curado2004, used with `information`

to compute

\[H_C(p) = \left( \sum_{i=1}^N e^{-b p_i} \right) + e^{-b} - 1,\]

with `b ∈ ℛ, b > 0`

, and the terms outside the sum ensures that $H_C(0) = H_C(1) = 0$.

The maximum entropy for `Curado`

is $L(1 - \exp(-b/L)) + \exp(-b) - 1$ with $L$ the `total_outcomes`

.

`ComplexityMeasures.DifferentialInfoEstimator`

— Type`DifferentialInfoEstimator`

The supertype of all differential information measure estimators. These estimators compute an information measure in various ways that do not involve explicitly estimating a probability distribution.

Each `DifferentialInfoEstimator`

s uses a specialized technique to approximate relevant densities/integrals, and is often tailored to one or a few types of information measures. For example, `Kraskov`

estimates the `Shannon`

entropy.

See `information`

for usage.

**Implementations**

`ComplexityMeasures.DiscreteInfoEstimator`

— Type`DiscreteInfoEstimator`

The supertype of all discrete information measure estimators, which are used in combination with a `ProbabilitiesEstimator`

as input to `information`

or related functions.

The first argument to a discrete estimator is always an `InformationMeasure`

(defaults to `Shannon`

).

**Description**

A discrete `InformationMeasure`

is a functional of a probability mass function. To estimate such a measure from data, we must first estimate a probability mass function using a `ProbabilitiesEstimator`

from the (encoded/discretized) input data, and then apply the estimator to the estimated probabilities. For example, the `Shannon`

entropy is typically computed using the `RelativeAmount`

estimator to compute probabilities, which are then given to the `PlugIn`

estimator. Many other estimators exist, not only for `Shannon`

entropy, but other information measures as well.

We provide a library of both generic estimators such as `PlugIn`

or `Jackknife`

(which can be applied to any measure), as well as dedicated estimators such as `MillerMadow`

, which computes `Shannon`

entropy using the Miller-Madow bias correction. The list below gives a complete overview.

**Implementations**

The following estimators are generic and can compute any `InformationMeasure`

.

`PlugIn`

. The default, generic plug-in estimator of any information measure. It computes the measure exactly as stated in the definition, using the computed probability mass function.`Jackknife`

. Uses the a combination of the plug-in estimator and the jackknife principle to estimate the information measure.

`Shannon`

entropy estimators

The following estimators are dedicated `Shannon`

entropy estimators, which provide improvements over the naive `PlugIn`

estimator.

Any of the implemented `DiscreteInfoEstimator`

s can be used in combination with *any* `ProbabilitiesEstimator`

as input to `information`

. What this means is that every estimator actually comes in many different variants - one for each `ProbabilitiesEstimator`

. For example, the `MillerMadow`

estimator of `Shannon`

entropy is typically calculated with `RelativeAmount`

probabilities. But here, you can use for example the `BayesianRegularization`

or the `Shrinkage`

probabilities estimators instead, i.e. `information(MillerMadow(), RelativeAmount(outcome_space), x)`

and `information(MillerMadow(), BayesianRegularization(outcomes_space), x)`

are distinct estimators. This holds for all `DiscreteInfoEstimator`

s. Many of these estimators haven't been explored in the literature before, so feel free to explore, and please cite this software if you use it to explore some new estimator combination!

`ComplexityMeasures.Dispersion`

— Type`Dispersion(; c = 5, m = 2, τ = 1, check_unique = true)`

An `OutcomeSpace`

based on dispersion patterns, originally used by Rostaghi2016 to compute the "dispersion entropy", which characterizes the complexity and irregularity of a time series.

Recommended parameter values Li2018 are `m ∈ [2, 3]`

, `τ = 1`

for the embedding, and `c ∈ [3, 4, …, 8]`

categories for the Gaussian symbol mapping.

**Description**

Assume we have a univariate time series $X = \{x_i\}_{i=1}^N$. First, this time series is encoded into a symbol timeseries $S$ using the Gaussian encoding `GaussianCDFEncoding`

with empirical mean `μ`

and empirical standard deviation `σ`

(both determined from $X$), and `c`

as given to `Dispersion`

.

Then, $S$ is embedded into an $m$-dimensional time series, using an embedding lag of $\tau$, which yields a total of $N - (m - 1)\tau$ delay vectors $z_i$, or "dispersion patterns". Since each element of $z_i$ can take on `c`

different values, and each delay vector has `m`

entries, there are `c^m`

possible dispersion patterns. This number is used for normalization when computing dispersion entropy.

The returned probabilities are simply the frequencies of the unique dispersion patterns present in $S$ (i.e., the `UniqueElements`

of $S$).

**Outcome space**

The outcome space for `Dispersion`

is the unique delay vectors whose elements are the the symbols (integers) encoded by the Gaussian CDF, i.e., the unique elements of $S$.

**Data requirements and parameters**

The input must have more than one unique element for the Gaussian mapping to be well-defined. Li2018 recommends that `x`

has at least 1000 data points.

If `check_unique == true`

(default), then it is checked that the input has more than one unique value. If `check_unique == false`

and the input only has one unique element, then a `InexactError`

is thrown when trying to compute probabilities.

Each embedding vector is called a "dispersion pattern". Why? Let's consider the case when $m = 5$ and $c = 3$, and use some very imprecise terminology for illustration:

When $c = 3$, values clustering far below mean are in one group, values clustered around the mean are in one group, and values clustering far above the mean are in a third group. Then the embedding vector $[2, 2, 2, 2, 2]$ consists of values that are close together (close to the mean), so it represents a set of numbers that are not very spread out (less dispersed). The embedding vector $[1, 1, 2, 3, 3]$, however, represents numbers that are much more spread out (more dispersed), because the categories representing "outliers" both above and below the mean are represented, not only values close to the mean.

For a version of this estimator that can be used on high-dimensional arrays, see `SpatialDispersion`

.

**Implements**

`codify`

. Used for encoding inputs where ordering matters (e.g. time series).

`ComplexityMeasures.Diversity`

— Type`Diversity`

An alias to `CosineSimilarityBinning`

.

`ComplexityMeasures.Ebrahimi`

— Type```
Ebrahimi <: DifferentialInfoEstimator
Ebrahimi(definition = Shannon(); m::Int = 1)
```

The `Ebrahimi`

estimator computes the `Shannon`

`information`

of a timeseries using the method from Ebrahimi1994, with logarithms to the `base`

specified in `definition`

.

The `Ebrahimi`

estimator belongs to a class of differential entropy estimators based on order statistics. It only works for *timeseries* input.

**Description**

Assume we have samples $\bar{X} = \{x_1, x_2, \ldots, x_N \}$ from a continuous random variable $X \in \mathbb{R}$ with support $\mathcal{X}$ and density function$f : \mathbb{R} \to \mathbb{R}$. `Ebrahimi`

estimates the `Shannon`

differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]

However, instead of estimating the above integral directly, it makes use of the equivalent integral, where $F$ is the distribution function for $X$,

\[H(X) = \int_0^1 \log \left(\dfrac{d}{dp}F^{-1}(p) \right) dp\]

This integral is approximated by first computing the order statistics of $\bar{X}$ (the input timeseries), i.e. $x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)}$. The `Ebrahimi`

`Shannon`

differential entropy estimate is then

\[\hat{H}_{E}(\bar{X}, m) = \dfrac{1}{n} \sum_{i = 1}^n \log \left[ \dfrac{n}{c_i m} (\bar{X}_{(i+m)} - \bar{X}_{(i-m)}) \right],\]

where

\[c_i = \begin{cases} 1 + \frac{i - 1}{m}, & 1 \geq i \geq m \\ 2, & m + 1 \geq i \geq n - m \\ 1 + \frac{n - i}{m} & n - m + 1 \geq i \geq n \end{cases}.\]

See also: `information`

, `Correa`

, `AlizadehArghami`

, `Vasicek`

, `DifferentialInfoEstimator`

.

`ComplexityMeasures.ElectronicEntropy`

— Type```
ElectronicEntropy <: InformationMeasure
ElectronicEntropy(; h = Shannon(; base = 2), j = ShannonExtropy(; base = 2))
```

The "electronic entropy" measure is defined in discrete form in Lad2015 as

\[H_{EL}(p) = H_S(p) + J_S(P),\]

where $H_S(p)$ is the `Shannon`

entropy and $J_S(p)$ is the `ShannonExtropy`

extropy of the probability vector $p$.

`ComplexityMeasures.Encoding`

— Type`Encoding`

The supertype for all encoding schemes. Encodings always encode elements of input data into the positive integers. The encoding API is defined by the functions `encode`

and `decode`

. Some probability estimators utilize encodings internally.

Current available encodings are:

`OrdinalPatternEncoding`

.`GaussianCDFEncoding`

.`RectangularBinEncoding`

.`RelativeMeanEncoding`

.`RelativeFirstDifferenceEncoding`

.`UniqueElementsEncoding`

.`BubbleSortSwapsEncoding`

.`PairDistanceEncoding`

.`CombinationEncoding`

, which can combine any of the above encodings.

`ComplexityMeasures.Entropy`

— Type`abstract type Entropy <: InformationMeasure end`

Abstract subtype of `InformationMeasure`

. It only exists to perform a sanity check when calling the `entropy`

function.

`ComplexityMeasures.FixedRectangularBinning`

— Type`FixedRectangularBinning(range::AbstractRange, D::Int = 1, precise = false)`

This is a convenience method where each dimension of the binning has the same range and the input data are `D`

dimensional, which defaults to 1 (timeseries).

`ComplexityMeasures.FixedRectangularBinning`

— Type```
FixedRectangularBinning <: AbstractBinning
FixedRectangularBinning(ranges::Tuple{<:AbstractRange...}, precise = false)
```

Rectangular box partition of state space where the partition along each dimension is explicitly given by each range `ranges`

, which is a tuple of `AbstractRange`

subtypes. Typically, each range is the output of the `range`

Base function, e.g., `ranges = (0:0.1:1, range(0, 1; length = 101), range(2.1, 3.2; step = 0.33))`

. All ranges must be sorted.

The optional second argument `precise`

dictates whether Julia Base's `TwicePrecision`

is used for when searching where a point falls into the range. Useful for edge cases of points being almost exactly on the bin edges, but it is exactly four times as slow, so by default it is `false`

.

Points falling outside the partition do not contribute to probabilities. Bins are always left-closed-right-open: `[a, b)`

. **This means that the last value of each of the ranges dictates the last right-closing value.** This value does *not* belong to the histogram! E.g., if given a range `r = range(0, 1; length = 11)`

, with `r[end] = 1`

, the value `1`

is outside the partition and would not attribute any increase of the probability corresponding to the last bin (here `[0.9, 1)`

)!

**Equivalently, the size of the histogram is histsize = map(r -> length(r)-1, ranges)!**

`FixedRectangularBinning`

leads to a well-defined outcome space without knowledge of input data, see `ValueBinning`

.

`ComplexityMeasures.FluctuationComplexity`

— Type```
FluctuationComplexity <: InformationMeasure
FluctuationComplexity(; definition = Shannon(; base = 2), base = 2)
```

The "fluctuation complexity" quantifies the standard deviation of the information content of the states $\omega_i$ around some summary statistic (`InformationMeasure`

) of a PMF. Specifically, given some outcome space $\Omega$ with outcomes $\omega_i \in \Omega$ and a probability mass function $p(\Omega) = \{ p(\omega_i) \}_{i=1}^N$, it is defined as

\[\sigma_I(p) := \sqrt{\sum_{i=1}^N p_i(I_i - H_*)^2}\]

where $I_i = -\log_{base}(p_i)$ is the information content of the i-th outcome. The type of information measure $*$ is controlled by `definition`

.

The `base`

controls the base of the logarithm that goes into the information content terms. Make sure that you pick a `base`

that is consistent with the base chosen for the `definition`

(relevant for e.g. `Shannon`

).

**Properties**

If `definition`

is the `Shannon`

entropy, then we recover the Shannon-type information fluctuation complexity from Bates1993. Then the fluctuation complexity is zero for PMFs with only a single non-zero element, or for the uniform distribution.

If `definition`

is not Shannon entropy, then the properties of the measure varies, and does not necessarily share the properties Bates1993.

As far as we know, using other information measures besides Shannon entropy for the fluctuation complexity hasn't been explored in the literature yet. Our implementation, however, allows for it. Please inform us if you try some new combinations!

`ComplexityMeasures.Gao`

— Type```
Gao <: DifferentialInfoEstimator
Gao(definition = Shannon(); k = 1, w = 0, corrected = true)
```

The `Gao`

estimator Gao2015 computes the `Shannon`

differential `information`

, using a `k`

-th nearest-neighbor approach based on Singh2003, with logarithms to the `base`

specified in `definition`

.

`w`

is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to `0`

, meaning that only the point itself is excluded when searching for neighbours).

Gao2015 give two variants of this estimator. If `corrected == false`

, then the uncorrected version is used. If `corrected == true`

, then the corrected version is used, which ensures that the estimator is asymptotically unbiased.

**Description**

Assume we have samples $\{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ from a continuous random variable $X \in \mathbb{R}^d$ with support $\mathcal{X}$ and density function$f : \mathbb{R}^d \to \mathbb{R}$. `KozachenkoLeonenko`

estimates the `Shannon`

differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]

`ComplexityMeasures.GaussianCDFEncoding`

— Type```
GaussianCDFEncoding <: Encoding
GaussianCDFEncoding{m}(; μ, σ, c::Int = 3)
```

An encoding scheme that `encode`

s a scalar or vector `χ`

into one of the integers `sᵢ ∈ [1, 2, …, c]`

based on the normal cumulative distribution function (NCDF), and `decode`

s the `sᵢ`

into subintervals of `[0, 1]`

(with some loss of information).

**Initializing a GaussianCDFEncoding**

The size of the input to be encoded must be known beforehand. One must therefore set `m = length(χ)`

, where `χ`

is the input (`m = 1`

for scalars, `m ≥ 2`

for vectors). To do so, one must explicitly give `m`

as a type parameter: e.g. `encoding = GaussianCDFEncoding{3}(; μ = 0.0, σ = 0.1)`

to encode 3-element vectors, or `encoding = GaussianCDFEncoding{1}(; μ = 0.0, σ = 0.1)`

to encode scalars.

**Description**

**Encoding/decoding scalars**

`GaussianCDFEncoding`

first maps an input scalar $χ$ to a new real number $y_ \in [0, 1]$ by using the normal cumulative distribution function (CDF) with the given mean `μ`

and standard deviation `σ`

, according to the map

\[x \to y : y = \dfrac{1}{ \sigma \sqrt{2 \pi}} \int_{-\infty}^{x} e^{(-(x - \mu)^2)/(2 \sigma^2)} dx.\]

Next, the interval `[0, 1]`

is equidistantly binned and enumerated $1, 2, \ldots, c$, and $y$ is linearly mapped to one of these integers using the linear map $y \to z : z = \text{floor}(y(c-1)) + 1$.

Because of the floor operation, some information is lost, so when used with `decode`

, each decoded `sᵢ`

is mapped to a *subinterval* of `[0, 1]`

. This subinterval is returned as a length-`1`

`Vector{SVector}`

.

Notice that the decoding step does not yield an element of any outcome space of the estimators that use `GaussianCDFEncoding`

internally, such as `Dispersion`

. That is because these estimators additionally delay embed the encoded data.

**Encoding/decoding vectors**

If `GaussianCDFEncoding`

is used with a vector `χ`

, then each element of `χ`

is encoded separately, resulting in a `length(χ)`

sequence of integers which may be treated as a `CartesianIndex`

. The encoded symbol `s ∈ [1, 2, …, c]`

is then just the linear index corresponding to this cartesian index (similar to how `CombinationEncoding`

works).

When `decode`

d, the integer symbol `s`

is converted back into its `CartesianIndex`

representation, which is just a sequence of integers that refer to subdivisions of the `[0, 1]`

interval. The relevant subintervals are then returned as a length-`χ`

`Vector{SVector}`

.

**Examples**

```
julia> using ComplexityMeasures, Statistics
julia> x = [0.1, 0.4, 0.7, -2.1, 8.0];
julia> μ, σ = mean(x), std(x); encoding = GaussianCDFEncoding(; μ, σ, c = 5)
julia> es = encode.(Ref(encoding), x)
5-element Vector{Int64}:
2
2
3
1
5
julia> decode(encoding, 3)
2-element SVector{2, Float64} with indices SOneTo(2):
0.4
0.6
```

`ComplexityMeasures.GeneralizedSchuermann`

— Type```
GeneralizedSchuermann <: DiscreteInfoEstimatorShannon
GeneralizedSchuermann(definition = Shannon(); a = 1.0)
```

The `GeneralizedSchuermann`

estimator is used with `information`

to compute the discrete `Shannon`

entropy with the bias-corrected estimator given in Grassberger2022.

The "generalized" part of the name, as opposed to the Schurmann2004 estimator (`Schuermann`

), is due to the possibility of picking difference parameters $a_i$ for different outcomes. If different parameters are assigned to the different outcomes, `a`

must be a vector of parameters of length `length(outcomes)`

, where the outcomes are obtained using `outcomes`

. See Grassberger2022 for more information. If `a`

is a real number, then $a_i = a \forall i$, and the estimator reduces to the `Schuermann`

estimator.

**Description**

For a set of $N$ observations over $M$ outcomes, the estimator is given by

\[H_S^{opt} = \varphi(N) - \dfrac{1}{N} \sum_{i=1}^M n_i G_{n_i}(a_i),\]

where $n_i$ is the observed frequency of the i-th outcome,

\[G_n(a) = \varphi(n) + (-1)^n \int_0^a \dfrac{x^{n - 1}}{x + 1} dx,\]

$G_n(1) = G_n$ and $G_n(0) = \varphi(n)$, and

\[G_n = \varphi(n) + (-1)^n \int_0^1 \dfrac{x^{n - 1}}{x + 1} dx.\]

`ComplexityMeasures.Goria`

— Type```
Goria <: DifferentialInfoEstimator
Goria(measure = Shannon(); k = 1, w = 0)
```

The `Goria`

estimator Goria2005 computes the `Shannon`

differential `information`

of a multi-dimensional `StateSpaceSet`

, with logarithms to the `base`

specified in `definition`

.

**Description**

Assume we have samples $\{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ from a continuous random variable $X \in \mathbb{R}^d$ with support $\mathcal{X}$ and density function$f : \mathbb{R}^d \to \mathbb{R}$. `Goria`

estimates the `Shannon`

differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]

Specifically, let $\bf{n}_1, \bf{n}_2, \ldots, \bf{n}_N$ be the distance of the samples $\{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ to their `k`

-th nearest neighbors. Next, let the geometric mean of the distances be

\[\hat{\rho}_k = \left( \prod_{i=1}^N \right)^{\dfrac{1}{N}}\]

Goria2005's estimate of Shannon differential entropy is then

\[\hat{H} = m\hat{\rho}_k + \log(N - 1) - \psi(k) + \log c_1(m),\]

where $c_1(m) = \dfrac{2\pi^\frac{m}{2}}{m \Gamma(m/2)}$ and $\psi$ is the digamma function.

`ComplexityMeasures.HorvitzThompson`

— Type```
HorvitzThompson <: DiscreteInfoEstimatorShannon
HorvitzThompson(measure::Shannon = Shannon())
```

The `HorvitzThompson`

estimator is used with `information`

to compute the discrete `Shannon`

entropy according to Horvitz1952.

**Description**

The Horvitz-Thompson estimator of `Shannon`

entropy is given by

\[H_S^{HT} = -\sum_{i=1}^M \dfrac{p_i \log(p_i) }{1 - (1 - p_i)^N},\]

where $N$ is the sample size and $M$ is the number of `outcomes`

. Given the true probability $p_i$ of the $i$-th outcome, $1 - (1 - p_i)^N$ is the probability that the outcome appears at least once in a sample of size $N$ Arora2022. Dividing by this inclusion probability is a form of weighting, and compensates for situations where certain outcomes have so low probabilities that they are not often observed in a sample, for example in power-law distributions.

`ComplexityMeasures.Identification`

— Type```
Identification <: InformationMeasure
Identification()
```

Identification entropy Ahlswede2006.

**Description**

The identification entropy is the functional

\[H_I(p) = 2\left( 1 - \sum_{i=1}^N p_i^2 \right).\]

Details about this entropy definition can be found in Ahlswede2021.

`ComplexityMeasures.InformationMeasure`

— Type`InformationMeasure`

`InformationMeasure`

is the supertype of all information measure *definitions*.

In this package, we define "information measures" as functionals of probability mass functions ("discrete" measures), or of probability density functions ("differential" measures). Examples are (generalized) entropies such as `Shannon`

or `Renyi`

, or extropies like `ShannonExtropy`

. Amigó2018 provides a useful review of generalized entropies.

**Used with**

Any of the information measures listed below can be used with

`information`

, to compute a numerical value for the measure, given some input data.`information_maximum`

, to compute the maximum possible value for the measure.`information_normalized`

, to compute the normalized form of the measure (divided by the maximum possible value).

The `information_maximum`

/`information_normalized`

functions only works with the discrete version of the measure. See docstrings for the above functions for usage examples.

**Implementations**

`Renyi`

.`Tsallis`

.`Shannon`

, which is a subcase of the above two in the limit`q → 1`

.`Kaniadakis`

.`Curado`

.`StretchedExponential`

.`RenyiExtropy`

.`TsallisExtropy`

.`ShannonExtropy`

, which is a subcase of the above two in the limit`q → 1`

.`FluctuationComplexity`

.

**Estimators**

A particular information measure may have both a discrete and a continuous/differential definition, which are estimated using a `DifferentialInfoEstimator`

or a `DifferentialInfoEstimator`

, respectively.

`ComplexityMeasures.InformationMeasureEstimator`

— Type`InformationMeasureEstimator{I <: InformationMeasure}`

The supertype of all information measure estimators. Its direct subtypes are `DiscreteInfoEstimator`

and `DifferentialInfoEstimator`

.

Since all estimators must reference a measure definition in some way, we made the following interface decisions:

- all estimators have as first type parameter
`I <: InformationMeasure`

- all estimators reference the information measure in a
`definition`

field - all estimators are defined using
`Base.@kwdef`

so that they may be initialized with the syntax`Estimator(; definition = Shannon())`

(or any other).

Any concrete subtypes must follow the above, e.g.:

```
Base.@kwdef struct MyEstimator{I <: InformationMeasure, X} <: DiscreteInfoEstimator{I}
definition::I
x::X
end
```

In real applications, we generally don't have access to the underlying probability mass functions or densities required to compute the various entropy or extropy definitons. Therefore, these information measures must be *estimated* from finite data. Estimating a particular measure (e.g. `Shannon`

entropy) can be done in many ways, each with its own own pros and cons. We aim to provide a complete library of literature estimators of the various information measures (PRs are welcome!).

`ComplexityMeasures.InvariantMeasure`

— Type`InvariantMeasure(to, ρ)`

Minimal return struct for `invariantmeasure`

that contains the estimated invariant measure `ρ`

, as well as the transfer operator `to`

from which it is computed (including bin information).

See also: `invariantmeasure`

.

`ComplexityMeasures.Jackknife`

— Type```
Jackknife <: DiscreteInfoEstimatorGeneric
Jackknife(definition::InformationMeasure = Shannon())
```

The `Jackknife`

estimator is used with `information`

to compute any discrete `InformationMeasure`

.

The `Jackknife`

estimator uses the generic jackknife principle to reduce bias. Zahl1977 was the first to apply the jaccknife technique in the context of `Shannon`

entropy estimation. Here, we've generalized his estimator to work with any `InformationMeasure`

.

**Description**

As an example of the jackknife technique, here is the formula for a jackknife estimate of `Shannon`

entropy

\[H_S^{J} = N H_S^{plugin} - \dfrac{N-1}{N} \sum_{i=1}^N {H_S^{plugin}}^{-\{i\}},\]

where $N$ is the sample size, $H_S^{plugin}$ is the plugin estimate of Shannon entropy, and ${H_S^{plugin}}^{-\{i\}}$ is the plugin estimate, but computed with the $i$-th sample left out.

`ComplexityMeasures.Kaniadakis`

— Type```
Kaniadakis <: InformationMeasure
Kaniadakis(; κ = 1.0, base = 2.0)
```

The Kaniadakis entropy Tsallis2009, used with `information`

to compute

\[H_K(p) = -\sum_{i=1}^N p_i f_\kappa(p_i),\]

\[f_\kappa (x) = \dfrac{x^\kappa - x^{-\kappa}}{2\kappa},\]

where if $\kappa = 0$, regular logarithm to the given `base`

is used, and 0 probabilities are skipped.

`ComplexityMeasures.KozachenkoLeonenko`

— Type```
KozachenkoLeonenko <: DifferentialInfoEstimator
KozachenkoLeonenko(definition = Shannon(); w::Int = 0)
```

The `KozachenkoLeonenko`

estimator KozachenkoLeonenko1987 computes the `Shannon`

differential `information`

of a multi-dimensional `StateSpaceSet`

, with logarithms to the `base`

specified in `definition`

.

**Description**

Assume we have samples $\{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ from a continuous random variable $X \in \mathbb{R}^d$ with support $\mathcal{X}$ and density function$f : \mathbb{R}^d \to \mathbb{R}$. `KozachenkoLeonenko`

estimates the `Shannon`

differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))]\]

using the nearest neighbor method from KozachenkoLeonenko1987, as described in Charzyńska2015.

`w`

is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to `0`

, meaning that only the point itself is excluded when searching for neighbours).

In contrast to `Kraskov`

, this estimator uses only the *closest* neighbor.

See also: `information`

, `Kraskov`

, `DifferentialInfoEstimator`

.

`ComplexityMeasures.Kraskov`

— Type```
Kraskov <: DifferentialInfoEstimator
Kraskov(definition = Shannon(); k::Int = 1, w::Int = 0)
```

The `Kraskov`

estimator computes the `Shannon`

differential `information`

of a multi-dimensional `StateSpaceSet`

using the `k`

-th nearest neighbor searches method from Kraskov2004, with logarithms to the `base`

specified in `definition`

.

`w`

is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to `0`

, meaning that only the point itself is excluded when searching for neighbours).

**Description**

Assume we have samples $\{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ from a continuous random variable $X \in \mathbb{R}^d$ with support $\mathcal{X}$ and density function$f : \mathbb{R}^d \to \mathbb{R}$. `Kraskov`

estimates the `Shannon`

differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]

See also: `information`

, `KozachenkoLeonenko`

, `DifferentialInfoEstimator`

.

`ComplexityMeasures.LempelZiv76`

— Type```
LempelZiv76 <: ComplexityEstimator
LempelZiv76()
```

The Lempel-Ziv, or `LempelZiv76`

, complexity measure LempelZiv1976, which is used with `complexity`

and `complexity_normalized`

.

For results to be comparable across sequences with different length, use the normalized version. Normalized `LempelZiv76`

-complexity is implemented as given in Amigó2004. The normalized measure is close to zero for very regular signals, while for random sequences, it is close to 1 with high probability^{[Amigó2004]}. Note: the normalized `LempelZiv76`

complexity can be higher than 1^{[Amigó2004]}.

The `LempelZiv76`

measure applies only to binary sequences, i.e. sequences with a two-element alphabet (precisely two distinct outcomes). For performance optimization, we do not check the number of unique elements in the input. If your input sequence is not binary, you must `encode`

it first using one of the implemented `Encoding`

schemes (or encode your data manually).

`ComplexityMeasures.LeonenkoProzantoSavani`

— Type```
LeonenkoProzantoSavani <: DifferentialInfoEstimator
LeonenkoProzantoSavani(definition = Shannon(); k = 1, w = 0)
```

The `LeonenkoProzantoSavani`

estimator LeonenkoProzantoSavani2008 computes the `Shannon`

, `Renyi`

, or `Tsallis`

differential `information`

of a multi-dimensional `StateSpaceSet`

, with logarithms to the `base`

specified in `definition`

.

**Description**

The estimator uses `k`

-th nearest-neighbor searches. `w`

is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to `0`

, meaning that only the point itself is excluded when searching for neighbours).

For details, see LeonenkoProzantoSavani2008.

`ComplexityMeasures.Lord`

— Type```
Lord <: DifferentialInfoEstimator
Lord(measure = Shannon(); k = 10, w = 0)
```

The `Lord`

estimator Lord2018 estimates the `Shannon`

differential `information`

using a nearest neighbor approach with a local nonuniformity correction (LNC), with logarithms to the `base`

specified in `definition`

.

`w`

is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to `0`

, meaning that only the point itself is excluded when searching for neighbours).

**Description**

Assume we have samples $\bar{X} = \{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ from a continuous random variable $X \in \mathbb{R}^d$ with support $\mathcal{X}$ and density function $f : \mathbb{R}^d \to \mathbb{R}$. `Lord`

estimates the `Shannon`

differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))],\]

by using the resubstitution formula

\[\hat{\bar{X}, k} = -\mathbb{E}[\log(f(X))] \approx \sum_{i = 1}^N \log(\hat{f}(\bf{x}_i)),\]

where $\hat{f}(\bf{x}_i)$ is an estimate of the density at $\bf{x}_i$ constructed in a manner such that $\hat{f}(\bf{x}_i) \propto \dfrac{k(x_i) / N}{V_i}$, where $k(x_i)$ is the number of points in the neighborhood of $\bf{x}_i$, and $V_i$ is the volume of that neighborhood.

While most nearest-neighbor based differential entropy estimators uses regular volume elements (e.g. hypercubes, hyperrectangles, hyperspheres) for approximating the local densities $\hat{f}(\bf{x}_i)$, the `Lord`

estimator uses hyperellopsoid volume elements. These hyperellipsoids are, for each query point `xᵢ`

, estimated using singular value decomposition (SVD) on the `k`

-th nearest neighbors of `xᵢ`

. Thus, the hyperellipsoids stretch/compress in response to the local geometry around each sample point. This makes `Lord`

a well-suited entropy estimator for a wide range of systems.

`ComplexityMeasures.MillerMadow`

— Type```
MillerMadow <: DiscreteInfoEstimatorShannon
MillerMadow(measure::Shannon = Shannon())
```

The `MillerMadow`

estimator is used with `information`

to compute the discrete `Shannon`

entropy according to Miller1955.

**Description**

The Miller-Madow estimator of Shannon entropy is given by

\[H_S^{MM} = H_S^{plugin} + \dfrac{m - 1}{2N},\]

where $H_S^{plugin}$ is the Shannon entropy estimated using the `PlugIn`

estimator, `m`

is the number of bins with nonzero probability (as defined in Paninski2003), and `N`

is the number of observations.

`ComplexityMeasures.MissingDispersionPatterns`

— Type```
MissingDispersionPatterns <: ComplexityEstimator
MissingDispersionPatterns(o = Dispersion()) → mdp
```

An estimator for the number of missing dispersion patterns (MDP), a complexity measure which can be used to detect nonlinearity in time series Zhou2023.

Used with `complexity`

or `complexity_normalized`

.

**Description**

When used with `complexity`

, `complexity(mdp)`

is syntactically equivalent with just `missing_outcomes`

`(o)`

. When used with `complexity_normalized`

, the normalization is simply `missing_outcomes(o)/total_outcomes(o)`

.

`Dispersion`

's linear mapping from CDFs to integers is based on equidistant partitioning of the interval `[0, 1]`

. This is slightly different from Zhou2023, which uses the linear mapping $s_i := \text{round}(y + 0.5)$.

**Usage**

In Zhou2023, `MissingDispersionPatterns`

is used to detect nonlinearity in time series by comparing the MDP for a time series `x`

to values for an ensemble of surrogates of `x`

, as per the standard analysis of TimeseriesSurrogates.jl If the MDP value of $x$ is significantly larger than some high quantile of the surrogate distribution, then it is taken as evidence for nonlinearity.

See also: `Dispersion`

, `ReverseDispersion`

, `total_outcomes`

.

`ComplexityMeasures.MultiScaleAlgorithm`

— Type`MultiScaleAlgorithm`

The supertype for all multiscale coarse-graining/downsampling algorithms. Concrete subtypes are:

`ComplexityMeasures.NaiveKernel`

— Type`NaiveKernel(ϵ::Real; method = KDTree, w = 0, metric = Euclidean()) <: OutcomeSpace`

An `OutcomeSpace`

based on a "naive" kernel density estimation approach (KDE), as discussed in PrichardTheiler1995.

Probabilities $P(\mathbf{x}, \epsilon)$ are assigned to every point $\mathbf{x}$ by counting how many other points occupy the space spanned by a hypersphere of radius `ϵ`

around $\mathbf{x}$, according to:

\[P_i( X, \epsilon) \approx \dfrac{1}{N} \sum_{s} B(||X_i - X_j|| < \epsilon),\]

where $B$ gives 1 if the argument is `true`

. Probabilities are then normalized.

**Keyword arguments**

`method = KDTree`

: the search structure supported by Neighborhood.jl. Specifically, use`KDTree`

to use a tree-based neighbor search, or`BruteForce`

for the direct distances between all points. KDTrees heavily outperform direct distances when the dimensionality of the data is much smaller than the data length.`w = 0`

: the Theiler window, which excludes indices $s$ that are within $|i - s| ≤ w$ from the given point $x_i$.`metric = Euclidean()`

: the distance metric.

**Outcome space**

The outcome space `Ω`

for `NaiveKernel`

are the indices of the input data, `eachindex(x)`

. Hence, input `x`

is needed for a well-defined `outcome_space`

. The reason to not return the data points themselves is because duplicate data points may not get assigned same probabilities (due to having different neighbors).

`ComplexityMeasures.OrdinalOutcomeSpace`

— TypeThe supertype for probability estimators based on permutation patterns.

Subtypes must implement fields:

`m::Int`

: The dimension of the permutation patterns.`lt::Function`

: A function determining how ties are to be broken when constructing permutation patterns from embedding vectors.

`ComplexityMeasures.OrdinalPatternEncoding`

— Type```
OrdinalPatternEncoding <: Encoding
OrdinalPatternEncoding{m}(lt = ComplexityMeasures.isless_rand)
```

An encoding scheme that `encode`

s length-`m`

vectors into their permutation/ordinal patterns and then into the integers based on the Lehmer code. It is used by `OrdinalPatterns`

and similar estimators, see that for a description of the outcome space.

The ordinal/permutation pattern of a vector `χ`

is simply `sortperm(χ)`

, which gives the indices that would sort `χ`

in ascending order.

**Description**

The Lehmer code, as implemented here, is a bijection between the set of `factorial(m)`

possible permutations for a length-`m`

sequence, and the integers `1, 2, …, factorial(m)`

. The encoding step uses algorithm 1 in Berger2019, which is highly optimized. The decoding step is much slower due to missing optimizations (pull requests welcomed!).

**Example**

```
julia> using ComplexityMeasures
julia> χ = [4.0, 1.0, 9.0];
julia> c = OrdinalPatternEncoding(3);
julia> i = encode(c, χ)
3
julia> decode(c, i)
3-element SVector{3, Int64} with indices SOneTo(3):
2
1
3
```

If you want to encode something that is already a permutation pattern, then you can use the non-exported `permutation_to_integer`

function.

`ComplexityMeasures.OrdinalPatterns`

— Type```
OrdinalPatterns <: OutcomeSpace
OrdinalPatterns{m}(τ = 1, lt::Function = ComplexityMeasures.isless_rand)
```

An `OutcomeSpace`

based on lengh-`m`

ordinal permutation patterns, originally introduced in BandtPompe2002's paper on permutation entropy. Note that `m`

is given as a type parameter, so that when it is a literal integer there are performance accelerations.

When passed to `probabilities`

the output depends on the input data type:

**Univariate data**. If applied to a univariate timeseries (`AbstractVector`

), then the timeseries is first embedded using embedding delay`τ`

and dimension`m`

, resulting in embedding vectors $\{ \bf{x}_i \}_{i=1}^{N-(m-1)\tau}$. Then, for each $\bf{x}_i$, we find its permutation pattern $\pi_{i}$. Probabilities are then estimated as the frequencies of the encoded permutation symbols by using`UniqueElements`

. When giving the resulting probabilities to`information`

, the original permutation entropy is computed BandtPompe2002.**Multivariate data**. If applied to a an`D`

-dimensional`StateSpaceSet`

, then no embedding is constructed,`m`

must be equal to`D`

and`τ`

is ignored. Each vector $\bf{x}_i$ of the dataset is mapped directly to its permutation pattern $\pi_{i}$ by comparing the relative magnitudes of the elements of $\bf{x}_i$. Like above, probabilities are estimated as the frequencies of the permutation symbols. The resulting probabilities can be used to compute multivariate permutation entropy He2016, although here we don't perform any further subdivision of the permutation patterns (as in Figure 3 of He2016).

Internally, `OrdinalPatterns`

uses the `OrdinalPatternEncoding`

to represent ordinal patterns as integers for efficient computations.

See `WeightedOrdinalPatterns`

and `AmplitudeAwareOrdinalPatterns`

for estimators that not only consider ordinal (sorting) patterns, but also incorporate information about within-state-vector amplitudes. For a version of this estimator that can be used on spatial data, see `SpatialOrdinalPatterns`

.

In BandtPompe2002, equal values are ordered after their order of appearance, but this can lead to erroneous temporal correlations, especially for data with low amplitude resolution Zunino2017. Here, by default, if two values are equal, then one of the is randomly assigned as "the largest", using `lt = ComplexityMeasures.isless_rand`

. To get the behaviour from BandtPompe2002, use `lt = Base.isless`

.

**Outcome space**

The outcome space `Ω`

for `OrdinalPatterns`

is the set of length-`m`

ordinal patterns (i.e. permutations) that can be formed by the integers `1, 2, …, m`

. There are `factorial(m)`

such patterns.

For example, the outcome `[2, 3, 1]`

corresponds to the ordinal pattern of having the smallest value in the second position, the next smallest value in the third position, and the next smallest, i.e. the largest value in the first position. See also `OrdinalPatternEncoding`

.

**In-place symbolization**

`OrdinalPatterns`

also implements the in-place `probabilities!`

for `StateSpaceSet`

input (or embedded vector input) for reducing allocations in looping scenarios. The length of the pre-allocated symbol vector must be the length of the dataset. For example

```
using ComplexityMeasures
m, N = 2, 100
est = OrdinalPatterns{m}(τ)
x = StateSpaceSet(rand(N, m)) # some input dataset
πs_ts = zeros(Int, N) # length must match length of `x`
p = probabilities!(πs_ts, est, x)
```

`ComplexityMeasures.Outcome`

— Type```
Outcome <: Number
Outcome(num::Integer)
```

A convenience wrapper around around an `Integer`

that represents an unspecified but enumerated outcome. It exists to distinguish the case of generic outcomes (which are allocated when using `counts`

) vs actual integer outcomes (which may be allocated when using `counts_and_outcomes`

). It is also used for pretty-printing `Counts`

and `Probabilities`

.

`ComplexityMeasures.OutcomeSpace`

— Type`OutcomeSpace`

The supertype for all outcome space implementation.

**Description**

In ComplexityMeasures.jl, an outcome space defines a set of possible outcomes $\Omega = \{\omega_1, \omega_2, \ldots, \omega_L \}$ (some form of discretization). In the literature, the outcome space is often also called an "alphabet", while each outcome is called a "symbol" or an "event".

An outcome space also defines a set of rules for mapping input data to to each outcome $\omega_i$, a processes called *encoding* or *symbolizing* or *discretizing* in the literature (see encodings). Some `OutcomeSpace`

s first apply a transformation, e.g. a delay embedding, to the data before discretizing/encoding, while other `OutcomeSpace`

s discretize/encode the data directly.

**Implementations**

Outcome space | Principle | Input data | Counting-compatible |
---|---|---|---|

`UniqueElements` | Count of unique elements | `Any` | ✔ |

`ValueBinning` | Binning (histogram) | `Vector` , `StateSpaceSet` | ✔ |

`OrdinalPatterns` | Ordinal patterns | `Vector` , `StateSpaceSet` | ✔ |

`SpatialOrdinalPatterns` | Ordinal patterns in space | `Array` | ✔ |

`Dispersion` | Dispersion patterns | `Vector` | ✔ |

`SpatialDispersion` | Dispersion patterns in space | `Array` | ✔ |

`CosineSimilarityBinning` | Cosine similarity | `Vector` | ✔ |

`BubbleSortSwaps` | Swap counts when sorting | `Vector` | ✔ |

`SequentialPairDistances` | Sequential state vector distances | `Vector` , `StateSpaceSet` | ✔ |

`TransferOperator` | Binning (transfer operator) | `Vector` , `StateSpaceSet` | ✖ |

`NaiveKernel` | Kernel density estimation | `StateSpaceSet` | ✖ |

`WeightedOrdinalPatterns` | Ordinal patterns | `Vector` , `StateSpaceSet` | ✖ |

`AmplitudeAwareOrdinalPatterns` | Ordinal patterns | `Vector` , `StateSpaceSet` | ✖ |

`WaveletOverlap` | Wavelet transform | `Vector` | ✖ |

`PowerSpectrum` | Fourier transform | `Vector` | ✖ |

In the column "input data" it is assumed that the `eltype`

of the input is `<: Real`

.

**Usage**

Outcome spaces are used as input to

`probabilities`

/`allprobabilities_and_outcomes`

for computing probability mass functions.`outcome_space`

, which returns the elements of the outcome space.`total_outcomes`

, which returns the cardinality of the outcome space.`counts`

/`counts_and_outcomes`

/`allcounts_and_outcomes`

, for obtaining raw counts instead of probabilities (only for counting-compatible outcome spaces).

**Counting-compatible vs. non-counting compatible outcome spaces**

There are two main types of outcome spaces.

- Counting-compatible outcome spaces have a well-defined way of counting how often each point in the (encoded) input data is mapped to a particular outcome $\omega_i$. These outcome spaces use
`encode`

to discretize the input data. Examples are`OrdinalPatterns`

(which encodes input data into ordinal patterns) or`ValueBinning`

(which discretizes points onto a regular grid). The table below lists which outcome spaces are counting compatible. - Non-counting compatible outcome spaces have no well-defined way of counting explicitly how often each point in the input data is mapped to a particular outcome $\omega_i$. Instead, these outcome spaces returns a vector of pre-normalized "relative counts", one for each outcome $\omega_i$. Examples are
`WaveletOverlap`

or`PowerSpectrum`

.

Counting-compatible outcome spaces can be used with *any* `ProbabilitiesEstimator`

to convert counts into probability mass functions. Non-counting-compatible outcome spaces can only be used with the maximum likelihood (`RelativeAmount`

) probabilities estimator, which estimates probabilities precisely by the relative frequency of each outcome (formally speaking, the `RelativeAmount`

estimator also requires counts, but for the sake of code consistency, we allow it to be used with relative frequencies as well).

The function `is_counting_based`

can be used to check whether an outcome space is based on counting.

**Deducing the outcome space (from data)**

Some outcome space models can deduce $\Omega$ without knowledge of the input, such as `OrdinalPatterns`

. Other outcome spaces require knowledge of the input data for concretely specifying $\Omega$, such as `ValueBinning`

with `RectangularBinning`

. If `o`

is some outcome space model and `x`

some input data, then `outcome_space`

`(o, x)`

returns the possible outcomes $\Omega$. To get the cardinality of $\Omega$, use `total_outcomes`

.

**Implementation details**

The element type of $\Omega$ varies between outcome space models, but it is guaranteed to be *hashable* and *sortable*. This allows for conveniently tracking the counts of a specific event across experimental realizations, by using the outcome as a dictionary key and the counts as the value for that key (or, alternatively, the key remains the outcome and one has a vector of probabilities, one for each experimental realization).

`ComplexityMeasures.PairDistanceEncoding`

— Type```
PairDistanceEncoding <: Encoding
PairDistanceEncoding(min_dist, max_dist; n = 2, metric = Chebyshev(), precise = false)
```

An encoding that `encode`

s point pairs of the form `Tuple{<:AbstractVector, <:AbstractVector}`

by first computing their distance using the given `metric`

, then dividing the interval [`min_dist, max_dist]`

into `n`

equal-size bins, and mapping the computed distance onto one of those bins. Bins are enumerated as `1:n`

. When `decode`

-ing the bin integer, the left edge of the bin is returned.

`precise`

has the same meaning as in `RectangularBinEncoding`

.

**Example**

Let's create an example where the minimum and maximum allowed distance is known.

```
using ComplexityMeasures, Distances, StaticArrays
m = Chebyshev()
y = [SVector(1.0), SVector(0.5), SVector(0.25), SVector(0.64)]
pair1, pair2, pair3 = (y[1], y[2]), (y[2], y[3]), (y[3], y[4])
dmax = m(pair1...) # dist = 0.50
dmin = m(pair2...) # dist = 0.25
dmid = m(pair3...) # dist = 0.39
# This should give five bins with left adges at [0.25], [0.30], [0.35], [0.40] and [0.45]
encoding = PairDistanceEncoding(dmin, dmax; n = 5, metric = m)
c1 = encode(encoding, pair1) # 5
c2 = encode(encoding, pair2) # 1
c3 = encode(encoding, pair3) # 3
decode(encoding, c3) ≈ [0.35] # true
```

`ComplexityMeasures.PlugIn`

— Type`PlugIn(e::InformationMeasure) <: DiscreteInfoEstimatorGeneric`

The `PlugIn`

estimator is also called the empirical/naive/"maximum likelihood" estimator, and is used with `information`

to any discrete `InformationMeasure`

.

It computes any quantity exactly as given by its formula. When computing an information measure, which here is defined as a probabilities functional, it computes the quantity directly from a probability mass function, which is derived from maximum-likelihood (`RelativeAmount`

estimates of the probabilities.

**Bias of plug-in estimates**

The plugin-estimator of `Shannon`

entropy underestimates the true entropy, with a bias that grows with the number of distinct `outcomes`

(Arora et al., 2022)Arora2022,

\[bias(H_S^{plugin}) = -\dfrac{K-1}{2N} + o(N^-1).\]

where `K`

is the number of distinct outcomes, and `N`

is the sample size. Many authors have tried to remedy this by proposing alternative Shannon entropy estimators. For example, the `MillerMadow`

estimator is a simple correction to the plug-in estimator that adds back the bias term above. Many other estimators exist; see `DiscreteInfoEstimator`

s for an overview.

`ComplexityMeasures.PowerSpectrum`

— Type`PowerSpectrum() <: OutcomeSpace`

An `OutcomeSpace`

based on the power spectrum of a timeseries (amplitude square of its Fourier transform).

If used with `probabilities`

, then the spectrum normalized to sum = 1 is returned as probabilities. The Shannon entropy of these probabilities is typically referred in the literature as *spectral entropy*, e.g. Llanos2017 and Tian2017.

The closer the spectrum is to flat, i.e., white noise, the higher the entropy. However, you can't compare entropies of timeseries with different length, because the binning in spectral space depends on the length of the input.

**Outcome space**

The outcome space `Ω`

for `PowerSpectrum`

is the set of frequencies in Fourier space. They should be multiplied with the sampling rate of the signal, which is assumed to be `1`

. Input `x`

is needed for a well-defined `outcome_space`

.

`ComplexityMeasures.PrintComponent`

— Type```
PrintComponent
PrintComponent(s; color::Union{Symbol,Int} = :normal,
bold::Bool = false, underline::Bool = false, blink::Bool = false,
hidden::Bool = false, reverse::Bool = false)
```

Stores a string `s`

and instructions for how it shall be printed.

`PrintComponent`

s are intended for use with `printstyled`

.

`ComplexityMeasures.Probabilities`

— Type```
Probabilities <: Array{<:AbstractFloat, N}
Probabilities(probs::Array [, outcomes [, dimlabels]]) → p
Probabilities(counts::Counts [, outcomes [, dimlabels]]) → p
```

`Probabilities`

stores an `N`

-dimensional array of probabilities, while ensuring that the array sums to 1 (normalized probability mass). In most cases the array is a standard vector. `p`

itself can be manipulated and iterated over, just like its stored array.

The probabilities correspond to `outcomes`

that describe the axes of the array. If `p isa Probabilities`

, then `p.outcomes[i]`

is an an abstract vector containing the outcomes along the `i`

-th dimension. The outcomes have the same ordering as the probabilities, so that `p[i][j]`

is the probability for outcome `p.outcomes[i][j]`

. The dimensions of the array are named, and can be accessed by `p.dimlabels`

, where `p.dimlabels[i]`

is the label of the `i`

-th dimension. Both `outcomes`

and `dimlabels`

are assigned automatically if not given. If the input is a set of `Counts`

, and `outcomes`

and `dimlabels`

are not given, then the labels and outcomes are inherited from the counts.

**Examples**

```
julia> probs = [0.2, 0.2, 0.2, 0.2]; Probabilities(probs) # will be normalized to sum to 1
Probabilities{Float64,1} over 4 outcomes
Outcome(1) 0.25
Outcome(2) 0.25
Outcome(3) 0.25
Outcome(4) 0.25
```

```
julia> c = Counts([12, 16, 12], ["out1", "out2", "out3"]); Probabilities(c)
Probabilities{Float64,1} over 3 outcomes
"out1" 0.3
"out2" 0.4
"out3" 0.3
```

`ComplexityMeasures.ProbabilitiesEstimator`

— Type`ProbabilitiesEstimator`

The supertype for all probabilities estimators.

The role of the probabilities estimator is to convert (pseudo-)counts to probabilities. Currently, the implementation of all probabilities estimators assume *finite* outcome space with known cardinality. Therefore, `ProbabilitiesEstimator`

accept an `OutcomeSpace`

as the first argument, which specifies the set of possible outcomes.

Probabilities estimators are used with `probabilities`

and `allprobabilities_and_outcomes`

.

**Implementations**

The default probabilities estimator is `RelativeAmount`

, which is compatible with any `OutcomeSpace`

. The following estimators only support counting-based outcomes.

**Description**

In ComplexityMeasures.jl, probability mass functions are estimated from data by defining a set of possible outcomes $\Omega = \{\omega_1, \omega_2, \ldots, \omega_L \}$ (by specifying an `OutcomeSpace`

), and assigning to each outcome $\omega_i$ a probability $p(\omega_i)$, such that $\sum_{i=1}^N p(\omega_i) = 1$ (by specifying a `ProbabilitiesEstimator`

).

`ComplexityMeasures.RectangularBinEncoding`

— Type```
RectangularBinEncoding <: Encoding
RectangularBinEncoding(binning::RectangularBinning, x)
RectangularBinEncoding(binning::FixedRectangularBinning)
```

An encoding scheme that `encode`

s points `χ ∈ x`

into their histogram bins.

The first call signature simply initializes a `FixedRectangularBinning`

and then calls the second call signature.

See `FixedRectangularBinning`

for info on mapping points to bins.

`ComplexityMeasures.RectangularBinning`

— Type`RectangularBinning(ϵ, precise = false) <: AbstractBinning`

Rectangular box partition of state space using the scheme `ϵ`

, deducing the histogram extent and bin width from the input data.

`RectangularBinning`

is a convenience struct. It is re-cast into `FixedRectangularBinning`

once the data are provided, so see that docstring for info on the bin calculation and the meaning of `precise`

.

Binning instructions are deduced from the type of `ϵ`

as follows:

`ϵ::Int`

divides each coordinate axis into`ϵ`

equal-length intervals that cover all data.`ϵ::Float64`

divides each coordinate axis into intervals of fixed size`ϵ`

, starting from the axis minima until the data is completely covered by boxes.`ϵ::Vector{Int}`

divides the i-th coordinate axis into`ϵ[i]`

equal-length intervals that cover all data.`ϵ::Vector{Float64}`

divides the i-th coordinate axis into intervals of fixed size`ϵ[i]`

, starting from the axis minima until the data is completely covered by boxes.

`RectangularBinning`

ensures all input data are covered by extending the created ranges if need be.

`ComplexityMeasures.RegularDownsampling`

— Type```
RegularDownsampling <: MultiScaleAlgorithm
RegularDownsampling(; f::Function = Statistics.mean, scales = 1:8)
```

The original multi-scale algorithm for multiscale entropy analysis Costa2002, which yields a single downsampled time series per scale `s`

.

**Description**

Given a scalar-valued input time series `x`

, the `Regular`

multiscale algorithm downsamples and coarse-grains `x`

by splitting it into non-overlapping windows of length `s`

, and then constructing a new downsampled time series $D_t(s, f)$ by applying the function `f`

to each of the resulting length-`s`

windows.

The downsampled time series `D_t(s)`

with `t ∈ [1, 2, …, L]`

, where `L = floor(N / s)`

, is given by:

\[\{ D_t(s, f) \}_{t = 1}^{L} = \left\{ f \left( \bf x_t \right) \right\}_{t = 1}^{L} = \left\{ {f\left( (x_i)_{i = (t - 1)s + 1}^{ts} \right)} \right\}_{t = 1}^{L}\]

where `f`

is some summary statistic applied to the length-`ts-((t - 1)s + 1)`

tuples `xₖ`

. Different choices of `f`

have yield different multiscale methods appearing in the literature. For example:

`f == Statistics.mean`

yields the original first-moment multiscale sample entropy Costa2002.`f == Statistics.var`

yields the generalized multiscale sample entropy Costa2015, which uses the second-moment (variance) instead of the mean.

**Keyword Arguments**

. The downsampling levels. If`scales`

`scales`

is set to an integer, then this integer is taken as maximum number of scales (i.e. levels of downsampling), and downsampling is done over levels`1:scales`

. Otherwise, downsampling is done over the provided`scales`

(which may be a range, or some specific scales (e.g.`scales = [1, 5, 6]`

). The maximum scale level is`length(x) ÷ 2`

, but to avoid applying the method to time series that are extremely short, consider limiting the maximum scale (e.g.`scales = length(x) ÷ 5`

).

See also: `CompositeDownsampling`

.

`ComplexityMeasures.RelativeAmount`

— Type```
RelativeAmount <: ProbabilitiesEstimator
RelativeAmount()
```

The `RelativeAmount`

estimator is used with `probabilities`

and related functions to estimate probabilities over the given `OutcomeSpace`

using maximum likelihood estimation (MLE), also called plug-in estimation. See `ProbabilitiesEstimator`

for usage.

**Description**

Consider a length-`m`

outcome space $\Omega$ and random sample of length `N`

. The maximum likelihood estimate of the probability of the `k`

-th outcome $\omega_k$ is

\[p(\omega_k) = \dfrac{n_k}{N},\]

where $n_k$ is the number of times the `k`

-th outcome was observed in the (encoded) sample.

This estimation is known as *maximum likelihood estimation*. However, `RelativeAmount`

also serves as the fall-back probabilities estimator for `OutcomeSpace`

s that are not count-based and only yield "pseudo-counts", for example `WaveletOverlap`

or `PowerSpectrum`

. These outcome spaces do not yield counts, but pre-normalized numbers that can be treated as "relative frequencies" or "relative power". Hence, this estimator is called `RelativeAmount`

.

**Examples**

```
using ComplexityMeasures
x = cumsum(randn(100))
ps = probabilities(OrdinalPatterns{3}(), x) # `RelativeAmount` is the default estimator
ps_mle = probabilities(RelativeAmount(), OrdinalPatterns{3}(), x) # equivalent
ps == ps_mle # true
```

See also: `BayesianRegularization`

, `Shrinkage`

.

`ComplexityMeasures.RelativeFirstDifferenceEncoding`

— Type```
RelativeFirstDifferenceEncoding <: Encoding
RelativeFirstDifferenceEncoding(minval::Real, maxval::Real; n = 2)
```

`RelativeFirstDifferenceEncoding`

encodes a vector based on the relative position the average of the *first differences* of the vectors has with respect to a predefined minimum and maximum value (`minval`

and `maxval`

, respectively).

**Description**

This encoding is inspired by Azami2016's algorithm for amplitude-aware permutation entropy. They use a linear combination of amplitude information and first differences information of state vectors to correct probabilities. Here, however, we explicitly encode the first differences part of the correction as an a integer symbol `Λ ∈ [1, 2, …, n]`

. The amplitude part of the encoding is available as the `RelativeMeanEncoding`

encoding.

**Encoding/decoding**

When used with `encode`

, an $m$-element state vector $\bf{x} = (x_1, x_2, \ldots, x_m)$ is encoded as $Λ = \dfrac{1}{m - 1}\sum_{k=2}^m |x_{k} - x_{k-1}|$. The value of $Λ$ is then normalized to lie on the interval `[0, 1]`

, assuming that the minimum/maximum value any single $abs(x_k - x_{k-1})$ can take is `minval`

/`maxval`

, respectively. Finally, the interval `[0, 1]`

is discretized into `n`

discrete bins, enumerated by positive integers `1, 2, …, n`

, and the number of the bin that the normalized $Λ$ falls into is returned. The smaller the mean first difference of the state vector is, the smaller the bin number is. The higher the mean first difference of the state vectors is, the higher the bin number is.

When used with `decode`

, the left-edge of the bin that the normalized $Λ$ fell into is returned.

**Performance tips**

If you are encoding multiple input vectors, it is more efficient to construct a `RelativeFirstDifferenceEncoding`

instance and re-use it:

```
minval, maxval = 0, 1
encoding = RelativeFirstDifferenceEncoding(minval, maxval; n = 4)
pts = [rand(3) for i = 1:1000]
[encode(encoding, x) for x in pts]
```

`ComplexityMeasures.RelativeMeanEncoding`

— Type```
RelativeMeanEncoding <: Encoding
RelativeMeanEncoding(minval::Real, maxval::Real; n = 2)
```

`RelativeMeanEncoding`

encodes a vector based on the relative position the mean of the vector has with respect to a predefined minimum and maximum value (`minval`

and `maxval`

, respectively).

**Description**

This encoding is inspired by Azami2016's algorithm for amplitude-aware permutation entropy. They use a linear combination of amplitude information and first differences information of state vectors to correct probabilities. Here, however, we explicitly encode the amplitude-part of the correction as an a integer symbol `Λ ∈ [1, 2, …, n]`

. The first-difference part of the encoding is available as the `RelativeFirstDifferenceEncoding`

encoding.

**Encoding/decoding**

When used with `encode`

, an $m$-element state vector $\bf{x} = (x_1, x_2, \ldots, x_m)$ is encoded as $Λ = \dfrac{1}{N}\sum_{i=1}^m abs(x_i)$. The value of $Λ$ is then normalized to lie on the interval `[0, 1]`

, assuming that the minimum/maximum value any single element $x_i$ can take is `minval`

/`maxval`

, respectively. Finally, the interval `[0, 1]`

is discretized into `n`

discrete bins, enumerated by positive integers `1, 2, …, n`

, and the number of the bin that the normalized $Λ$ falls into is returned.

When used with `decode`

, the left-edge of the bin that the normalized $Λ$ fell into is returned.

`ComplexityMeasures.Renyi`

— Type```
Renyi <: InformationMeasure
Renyi(q, base = 2)
Renyi(; q = 1.0, base = 2)
```

The Rényi generalized order-`q`

entropy Rényi1961, used with `information`

to compute an entropy with units given by `base`

(typically `2`

or `MathConstants.e`

).

**Description**

Let $p$ be an array of probabilities (summing to 1). Then the Rényi generalized entropy is

\[H_q(p) = \frac{1}{1-q} \log \left(\sum_i p[i]^q\right)\]

and generalizes other known entropies, like e.g. the information entropy ($q = 1$, see Shannon1948), the maximum entropy ($q=0$, also known as Hartley entropy), or the correlation entropy ($q = 2$, also known as collision entropy).

The maximum value of the Rényi entropy is $\log_{base}(L)$, which is the entropy of the uniform distribution with $L$ the `total_outcomes`

.

`ComplexityMeasures.RenyiExtropy`

— Type```
RenyiExtropy <: InformationMeasure
RenyiExtropy(; q = 1.0, base = 2)
```

The Rényi extropy Liu2023.

**Description**

`RenyiExtropy`

is used with `information`

to compute

\[J_R(P) = \dfrac{-(n - 1) \log{(n - 1)} + (n - 1) \log{ \left( \sum_{i=1}^N {(1 - p[i])}^q \right)} }{q - 1}\]

for a probability distribution $P = \{p_1, p_2, \ldots, p_N\}$, with the $\log$ at the given `base`

. Alternatively, `RenyiExtropy`

can be used with `information_normalized`

, which ensures that the computed extropy is on the interval $[0, 1]$ by normalizing to to the maximal Rényi extropy, given by

\[J_R(P) = (N - 1)\log \left( \dfrac{n}{n-1} \right) .\]

`ComplexityMeasures.ReverseDispersion`

— Type```
ReverseDispersion <: ComplexityEstimator
ReverseDispersion(; c = 3, m = 2, τ = 1, check_unique = true)
```

Estimator for the reverse dispersion entropy complexity measure Li2019.

**Description**

Li2019 defines the reverse dispersion entropy as

\[H_{rde} = \sum_{i = 1}^{c^m} \left(p_i - \dfrac{1}{{c^m}} \right)^2 = \left( \sum_{i=1}^{c^m} p_i^2 \right) - \dfrac{1}{c^{m}}\]

where the probabilities $p_i$ are obtained precisely as for the `Dispersion`

probability estimator. Relative frequencies of dispersion patterns are computed using the given `encoding`

scheme , which defaults to encoding using the normal cumulative distribution function (NCDF), as implemented by `GaussianCDFEncoding`

, using embedding dimension `m`

and embedding delay `τ`

. Recommended parameter valuesLi2018 are `m ∈ [2, 3]`

, `τ = 1`

for the embedding, and `c ∈ [3, 4, …, 8]`

categories for the Gaussian mapping.

If normalizing, then the reverse dispersion entropy is normalized to `[0, 1]`

.

The minimum value of $H_{rde}$ is zero and occurs precisely when the dispersion pattern distribution is flat, which occurs when all $p_i$s are equal to $1/c^m$. Because $H_{rde} \geq 0$, $H_{rde}$ can therefore be said to be a measure of how far the dispersion pattern probability distribution is from white noise.

**Data requirements**

The input must have more than one unique element for the default `GaussianCDFEncoding`

to be well-defined. Li2018 recommends that `x`

has at least 1000 data points.

If `check_unique == true`

(default), then it is checked that the input has more than one unique value. If `check_unique == false`

and the input only has one unique element, then a `InexactError`

is thrown when trying to compute probabilities.

`ComplexityMeasures.SampleEntropy`

— Type`SampleEntropy([x]; r = 0.2std(x), kwargs...) <: ComplexityEstimator`

An estimator for the sample entropy complexity measure Richman2000, used with `complexity`

and `complexity_normalized`

.

The keyword argument `r`

is mandatory if an input timeseries `x`

is not provided.

**Keyword arguments**

`r::Real`

: The radius used when querying for nearest neighbors around points. Its value should be determined from the input data, for example as some proportion of the standard deviation of the data.`m::Int = 2`

: The embedding dimension.`τ::Int = 1`

: The embedding lag.

**Description**

An *estimator* for sample entropy using radius `r`

, embedding dimension `m`

, and embedding lag `τ`

is

\[SampEn(m,r, N) = -\ln{\dfrac{A(r, N)}{B(r, N)}}.\]

Here,

\[\begin{aligned} B(r, m, N) = \sum_{i = 1}^{N-m\tau} \sum_{j = 1, j \neq i}^{N-m\tau} \theta(d({\bf x}_i^m, {\bf x}_j^m) \leq r) \\ A(r, m, N) = \sum_{i = 1}^{N-m\tau} \sum_{j = 1, j \neq i}^{N-m\tau} \theta(d({\bf x}_i^{m+1}, {\bf x}_j^{m+1}) \leq r) \\ \end{aligned},\]

where $\theta(\cdot)$ returns 1 if the argument is true and 0 otherwise, and $d(x, y)$ computes the Chebyshev distance between $x$ and $y$, and ${\bf x}_i^{m}$ and ${\bf x}_i^{m+1}$ are `m`

-dimensional and `m+1`

-dimensional embedding vectors, where `k`

-dimensional embedding vectors are constructed from the input timeseries $x(t)$ as

\[{\bf x}_i^k = (x(i), x(i+τ), x(i+2τ), \ldots, x(i+(k-1)\tau)).\]

Quoting Richman & Moorman (2002): "SampEn(m,r,N) will be defined except when B = 0, in which case no regularity has been detected, or when A = 0, which corresponds to a conditional probability of 0 and an infinite value of SampEn(m,r,N)". In these cases, `NaN`

is returned.

If computing the normalized measure, then the resulting sample entropy is on `[0, 1]`

.

The original algorithm fixes `τ = 1`

. All formulas here are modified to account for any `τ`

.

See also: `entropy_sample`

.

`ComplexityMeasures.Schuermann`

— Type```
Schuermann <: DiscreteInfoEstimatorShannon
Schuermann(definition::Shannon; a = 1.0)
```

The `Schuermann`

estimator is used with `information`

to compute the discrete `Shannon`

entropy with the bias-corrected estimator given in Schurmann2004.

See detailed description for `GeneralizedSchuermann`

for details.

`ComplexityMeasures.SequentialPairDistances`

— Type```
SequentialPairDistances <: CountBasedOutcomeSpace
SequentialPairDistances(x::AbstractVector; n = 3, metric = Chebyshev(), m = 3, τ = 1)
SequentialPairDistances(x::AbstractStateSpaceSet; n = 3, metric = Chebyshev())
```

An outcome space based on the distribution of distances of sequential pairs of points.

This outcome space appears implicitly as part of the "distribution entropy" introduced by Li2015, which of course can be reproduced here (see example below). We've generalized the method to be used with any `InformationMeasure`

and `DiscreteInfoEstimator`

, and with valid distance `metric`

(from Distances.jl).

Input data `x`

are needed for initialization, because distances must be pre-computed to know the minimum/maximum distances needed for binning the distribution of pairwise distances. If the input is an `AbstractVector`

, then the vector is embedded before computing distances. If the input is an `AbstractStateSpaceSet`

, then the embedding step is skipped and distances are computed directly on each state vector `xᵢ ∈ x`

.

**Description**

`SequentialPairDistances`

does the following:

- Transforms the input timeseries
`x`

by first embedding it using embedding dimension`m`

and embedding lag`τ`

(or skip this step if the input is already embedded). - Computes the distances
`ds`

between sequential pairs of points according to the given`metric`

. - Divides the interval
`[minimum(ds), maximum(ds)]`

into`n`

equal-size bins by using`RectangularBinEncoding`

, then maps the distances onto these bins.

**Outcome space**

The outcome space `Ω`

for `SequentialPairDistances`

are the bins onto which the pairwise distances are mapped, encoded as the integers `1:n`

. If you need the actual bin coordinates, these can be recovered with `decode`

(see example below).

**Implements**

`codify`

. Note that the input`x`

is ignored when calling`codify`

, because the input data is already handled when constructing a`SequentialPairDistances`

.

**Examples**

The outcome bins can be retrieved as follows.

```
using ComplexityMeasures
x = rand(100)
o = SequentialPairDistances(x)
cts, outs = counts_and_outcomes(o, x)
```

Computing the "distribution entropy" with `n = 3`

bins for the distance histogram:

```
using ComplexityMeasures
x = rand(1000000)
o = SequentialPairDistances(x, n = 3, metric = Chebyshev()) # metric from original paper
h = information(Shannon(base = 2), o, x)
```

`ComplexityMeasures.Shannon`

— Type```
Shannon <: InformationMeasure
Shannon(; base = 2)
```

The Shannon Shannon1948 entropy, used with `information`

to compute:

\[H(p) = - \sum_i p[i] \log(p[i])\]

with the $\log$ at the given `base`

.

The maximum value of the Shannon entropy is $\log_{base}(L)$, which is the entropy of the uniform distribution with $L$ the `total_outcomes`

.

`ComplexityMeasures.ShannonExtropy`

— Type```
ShannonExtropy <: InformationMeasure
ShannonExtropy(; base = 2)
```

The Shannon extropy Lad2015, used with `information`

to compute

\[J(x) = -\sum_{i=1}^N (1 - p[i]) \log{(1 - p[i])},\]

for a probability distribution $P = \{p_1, p_2, \ldots, p_N\}$, with the $\log$ at the given `base`

.

`ComplexityMeasures.Shrinkage`

— Type```
Shrinkage{<:OutcomeSpace} <: ProbabilitiesEstimator
Shrinkage(; t = nothing, λ = nothing)
```

The `Shrinkage`

estimator is used with `probabilities`

and related functions to estimate probabilities over the given `m`

-element counting-based `OutcomeSpace`

using James-Stein-type shrinkage JamesStein1992, as presented in Hausser2009.

**Description**

The `Shrinkage`

estimator estimates a cell probability $\theta_{k}^{\text{Shrink}}$ as

\[\theta_{k}^{\text{Shrink}} = \lambda t_k + (1-\lambda) \hat{\theta}_k^{RelativeAmount},\]

where $\lambda \in [0, 1]$ is the shrinkage intensity ($\lambda = 0$ means no shrinkage, and $\lambda = 1$ means full shrinkage), and $t_k$ is the shrinkage target. Hausser2009 picks $t_k = 1/m$, i.e. the uniform distribution.

If `t == nothing`

, then $t_k$ is set to $1/m$ for all $k$, as in Hausser2009. If `λ == nothing`

(the default), then the shrinkage intensity is optimized according to Hausser2009. Hence, you should probably not pick `λ`

nor `t`

manually, unless you know what you are doing.

**Assumptions**

The `Shrinkage`

estimator assumes a fixed and known number of outcomes `m`

. Thus, using it with `probabilities_and_outcomes`

) and `allprobabilities_and_outcomes`

will yield different results, depending on whether all outcomes are observed in the input data or not. For `probabilities_and_outcomes`

, `m`

is the number of *observed* outcomes. For `allprobabilities_and_outcomes`

, `m = total_outcomes(o, x)`

, where `o`

is the `OutcomeSpace`

and `x`

is the input data.

If used with `allprobabilities_and_outcomes`

, then outcomes which have not been observed may be assigned non-zero probabilities. This might affect your results if using e.g. `missing_outcomes`

.

**Examples**

```
using ComplexityMeasures
x = cumsum(randn(100))
ps_shrink = probabilities(Shrinkage(), OrdinalPatterns{3}(), x)
```

See also: `RelativeAmount`

, `BayesianRegularization`

.

`ComplexityMeasures.SpatialBubbleSortSwaps`

— Type```
SpatialBubbleSortSwaps <: SpatialOutcomeSpace
SpatialBubbleSortSwaps(stencil, x; periodic = true)
```

`SpatialBubbleSortSwaps`

generalizes `BubbleSortSwaps`

to high-dimensional arrays by encoding pixel/voxel/hypervoxel windows in terms of how many swap operations the bubble sort algorithm requires to sort them.

What does this mean? For `BubbleSortSwaps`

the input data is embedded using embedding dimension `m`

and the number of swaps required are computed for each embedding vector. For `SpatialBubbleSortSwaps`

, the "embedding dimension" `m`

for is inferred from the number of elements in the `stencil`

, and the "embedding vectors" are the hypervoxels selected by the `stencil`

.

**Outcome space**

The outcome space `Ω`

for `SpatialBubbleSortSwaps`

is the range of integers `0:(n*(n-1)÷2)`

, corresponding to the number of swaps required by the bubble sort algorithm to sort a particular pixel/voxel/hypervoxel window.

**Arguments**

`stencil`

. Defines what local area (hyperrectangle), or which points within this area, to include around each hypervoxel (i.e. pixel in 2D). See`SpatialOrdinalPatterns`

and`SpatialDispersion`

for more information about stencils and examples of how to specify them.`x::AbstractArray`

. The input data. Must be provided because we need to know its size for optimization and bound checking.

**Keyword arguments**

`periodic::Bool`

. If`periodic == true`

, then the stencil should wrap around at the end of the array. If`periodic = false`

, then pixels whose stencil exceeds the array bounds are skipped.

**Example**

```
using ComplexityMeasures
using Random; rng = MersenneTwister(1234)
x = rand(rng, 100, 100, 100) # some 3D image
stencil = zeros(Int,2,2,2) # 3D stencil
stencil[:, :, 1] = [1 0; 1 1]
stencil[:, :, 2] = [0 1; 1 0]
o = SpatialBubbleSortSwaps(stencil, x)
# Distribution of "bubble sorting complexity" among voxel windows
counts_and_outcomes(o, x)
# "Spatial bubble Kaniadakis entropy", with shrinkage-adjusted probabilities
information(Kaniadakis(), Shrinkage(), o, x)
```

`ComplexityMeasures.SpatialDispersion`

— Type```
SpatialDispersion <: OutcomeSpace
SpatialDispersion(stencil, x::AbstractArray;
periodic = true,
c = 5,
skip_encoding = false,
L = nothing,
)
```

A dispersion-based `OutcomeSpace`

that generalises `Dispersion`

for input data that are high-dimensional arrays.

`SpatialDispersion`

is based on Azami2019's 2D square dispersion (Shannon) entropy estimator, but is here implemented as a pure probabilities probabilities estimator that is generalized for `N`

-dimensional input data `x`

, with arbitrary neighborhood regions (stencils) and (optionally) periodic boundary conditions.

In combination with `information`

and `information_normalized`

, this probabilities estimator can be used to compute (normalized) generalized spatiotemporal dispersion `InformationMeasure`

of any type.

**Arguments**

`stencil`

. Defines what local area (hyperrectangle), or which points within this area, to include around each hypervoxel (i.e. pixel in 2D). The examples below demonstrate different ways of specifying stencils. For details, see`SpatialOrdinalPatterns`

. See`SpatialOrdinalPatterns`

for more information about stencils.`x::AbstractArray`

. The input data. Must be provided because we need to know its size for optimization and bound checking.

**Keyword arguments**

`periodic::Bool`

. If`periodic == true`

, then the stencil should wrap around at the end of the array. If`periodic = false`

, then pixels whose stencil exceeds the array bounds are skipped.`c::Int`

. Determines how many discrete categories to use for the Gaussian encoding.`skip_encoding`

. If`skip_encoding == true`

,`encoding`

is ignored, and dispersion patterns are computed directly from`x`

, under the assumption that`L`

is the alphabet length for`x`

(useful for categorical or integer data). Thus, if`skip_encoding == true`

, then`L`

must also be specified. This is useful for categorical or integer-valued data.`L`

. If`L == nothing`

(default), then the number of total outcomes is inferred from`stencil`

and`encoding`

. If`L`

is set to an integer, then the data is considered pre-encoded and the number of total outcomes is set to`L`

.

**Outcome space**

The outcome space for `SpatialDispersion`

is the unique delay vectors whose elements are the the symbols (integers) encoded by the Gaussian CDF. Hence, the outcome space is all `m`

-dimensional delay vectors whose elements are all possible values in `1:c`

. There are `c^m`

such vectors.

**Description**

Estimating probabilities/entropies from higher-dimensional data is conceptually simple.

- Discretize each value (hypervoxel) in
`x`

relative to all other values`xᵢ ∈ x`

using the provided`encoding`

scheme. - Use
`stencil`

to extract relevant (discretized) points around each hypervoxel. - Construct a symbol these points.
- Take the sum-normalized histogram of the symbol as a probability distribution.
- Optionally, compute
`information`

or`information_normalized`

from this probability distribution.

**Usage**

Here's how to compute spatial dispersion entropy using the three different ways of specifying stencils.

```
x = rand(50, 50) # first "time slice" of a spatial system evolution
# Cartesian stencil
stencil_cartesian = CartesianIndex.([(0,0), (1,0), (1,1), (0,1)])
est = SpatialDispersion(stencil_cartesian, x)
information_normalized(est, x)
# Extent/lag stencil
extent = (2, 2); lag = (1, 1); stencil_ext_lag = (extent, lag)
est = SpatialDispersion(stencil_ext_lag, x)
information_normalized(est, x)
# Matrix stencil
stencil_matrix = [1 1; 1 1]
est = SpatialDispersion(stencil_matrix, x)
information_normalized(est, x)
```

To apply this to timeseries of spatial data, simply loop over the call (broadcast), e.g.:

```
imgs = [rand(50, 50) for i = 1:100]; # one image per second over 100 seconds
stencil = ((2, 2), (1, 1)) # a 2x2 stencil (i.e. dispersion patterns of length 4)
est = SpatialDispersion(stencil, first(imgs))
h_vs_t = information_normalized.(Ref(est), imgs)
```

Computing generalized spatiotemporal dispersion entropy is trivial, e.g. with `Renyi`

:

```
x = reshape(repeat(1:5, 500) .+ 0.1*rand(500*5), 50, 50)
est = SpatialDispersion(stencil, x)
information(Renyi(q = 2), est, x)
```

See also: `SpatialOrdinalPatterns`

, `GaussianCDFEncoding`

, `codify`

.

`ComplexityMeasures.SpatialOrdinalPatterns`

— Type```
SpatialOrdinalPatterns <: OutcomeSpaceModel
SpatialOrdinalPatterns(stencil, x; periodic = true)
```

A symbolic, permutation-based `OutcomeSpace`

for spatiotemporal systems that generalises `OrdinalPatterns`

to high-dimensional arrays. The order `m`

of the permutation pattern is extracted from the `stencil`

, see below.

`SpatialOrdinalPatterns`

is based on the 2D and 3D *spatiotemporal permutation entropy* estimators by Ribeiro2012 and Schlemmer2018, respectively, but is here implemented as a pure probabilities probabilities estimator that is generalized for `D`

-dimensional input array `x`

, with arbitrary regions (stencils) to get patterns form and (possibly) periodic boundary conditions.

See below for ways to specify the `stencil`

. If `periodic = true`

, then the stencil wraps around at the ends of the array. If `false`

, then collected regions with indices which exceed the array bounds are skipped.

In combination with `information`

and `information_normalized`

, this probabilities estimator can be used to compute generalized spatiotemporal permutation `InformationMeasure`

of any type.

**Outcome space**

The outcome space `Ω`

for `SpatialOrdinalPatterns`

is the set of length-`m`

ordinal patterns (i.e. permutations) that can be formed by the integers `1, 2, …, m`

, ordered lexicographically. There are `factorial(m)`

such patterns. Here `m`

refers to the number of points included in `stencil`

.

**Stencils**

The `stencil`

defines what local area to use to group hypervoxels. Each grouping of hypervoxels is mapped to an order-`m`

permutation pattern, which is then mapped to an integer as in `OrdinalPatterns`

. The `stencil`

is moved around the input array, in a sense "scanning" the input array, to collect all possible groupings allowed by the boundary condition (periodic or not).

Stencils are passed in one of the following three ways:

- As vectors of
`CartesianIndex`

which encode the offset of indices to include in the stencil, with respect to the current array index when scanning over the array. For example`stencil = CartesianIndex.([(0,0), (0,1), (1,1), (1,0)])`

. Don't forget to include the zero offset index if you want to include the hypervoxel itself, which is almost always the case. Here the stencil creates a 2x2 square extending to the bottom and right of the pixel (directions here correspond to the way Julia prints matrices by default). When passing a stencil as a vector of`CartesianIndex`

,`m = length(stencil)`

. - As a
`D`

-dimensional array (where`D`

matches the dimensionality of the input data) containing`0`

s and`1`

s, where if`stencil[index] == 1`

, the corresponding pixel is included, and if`stencil[index] == 0`

, it is not included. To generate the same estimator as in 1., use`stencil = [1 1; 1 1]`

. When passing a stencil as a`D`

-dimensional array,`m = sum(stencil)`

- As a
`Tuple`

containing two`Tuple`

s, both of length`D`

, for`D`

-dimensional data. The first tuple specifies the`extent`

of the stencil, where`extent[i]`

dictates the number of hypervoxels to be included along the`i`

th axis and`lag[i]`

the separation of hypervoxels along the same axis. This method can only generate (hyper)rectangular stencils. To create the same estimator as in the previous examples, use here`stencil = ((2, 2), (1, 1))`

. When passing a stencil using`extent`

and`lag`

,`m = prod(extent)`

.

`ComplexityMeasures.SpatialOutcomeSpace`

— TypeA convenience abstract type that makes dispatch for pixel retrieval easier.

`ComplexityMeasures.StatisticalComplexity`

— Type```
StatisticalComplexity <: ComplexityEstimator
StatisticalComplexity(; kwargs...)
```

An estimator for the statistical complexity and entropy, originally by Rosso2007 and generalized by Rosso2013.

Our implementation extends the generalization to any valid distance metric, any `OutcomeSpace`

with a priori known `total_outcomes`

, any `ProbabilitiesEstimator`

, and any normalizable discrete `InformationMeasure`

.

Used with `complexity`

.

**Keyword arguments**

`o::OutcomeSpace = OrdinalPatterns{3}()`

. The`OutcomeSpace`

, which controls how the input data are discretized.`pest::ProbabilitiesEstimator = RelativeAmount()`

: The`ProbabilitiesEstimator`

used to estimate probabilities over the discretized input data.`hest = Renyi()`

: A`DiscreteInfoEstimator`

or an`InformationMeasure`

. Any information measure that defines`information_maximum`

is valid here including extropies. The measure will be estimated using the`PlugIn`

estimator if not given an estimator.`dist <: SemiMetric = JSDivergence()`

: The distance measure (from Distances.jl) to use for estimating the distance between the estimated probability distribution and a uniform distribution with the same maximal number of outcomes.

**Description**

Statistical complexity is defined as

\[C_q[P] = \mathcal{H}_q\cdot \mathcal{Q}_q[P],\]

where $Q_q$ is a "disequilibrium" obtained from a distance-measure and $H_q$ a disorder measure. In the original paperRosso2007, this complexity measure was defined via an ordinal pattern-based probability distribution (see `OrdinalPatterns`

), using `Shannon`

entropy as the information measure, and the Jensen-Shannon divergence as a distance measure.

Our implementation is a further generalization of the complexity measure developed in Rosso2013. We let $H_q$be any normalizable `InformationMeasure`

, e.g. `Shannon`

, `Renyi`

or `Tsallis`

entropy, and we let $Q_q$ be either on the Euclidean, Wooters, Kullback, q-Kullback, Jensen or q-Jensen distance as

\[Q_q[P] = Q_q^0\cdot D[P, P_e],\]

where $D[P, P_e]$ is the distance between the obtained distribution $P$ and a uniform distribution with the same maximum number of bins, measured by the distance measure `dist`

.

**Usage**

The statistical complexity is exclusively used in combination with the chosen information measure (typically an entropy). The estimated value of the information measure can be accessed as a `Ref`

value of the struct as

```
x = randn(100)
c = StatisticalComplexity()
compl = complexity(c, x)
entr = first(entropy_complexity(c, x)) # both complexity and entropy value
```

`complexity(c::StatisticalComplexity, x)`

returns only the statistical complexity. To obtain both the value of the entropy (or other information measure) and the statistical complexity together as a `Tuple`

, use the wrapper `entropy_complexity`

.

See also: `entropy_complexity_curves`

.

`ComplexityMeasures.StretchedExponential`

— Type```
StretchedExponential <: InformationMeasure
StretchedExponential(; η = 2.0, base = 2)
```

The stretched exponential, or Anteneodo-Plastino, entropy Anteneodo1999, used with `information`

to compute

\[S_{\eta}(p) = \sum_{i = 1}^N \Gamma \left( \dfrac{\eta + 1}{\eta}, - \log_{base}(p_i) \right) - p_i \Gamma \left( \dfrac{\eta + 1}{\eta} \right),\]

where $\eta \geq 0$, $\Gamma(\cdot, \cdot)$ is the upper incomplete Gamma function, and $\Gamma(\cdot) = \Gamma(\cdot, 0)$ is the Gamma function. Reduces to `Shannon`

entropy for `η = 1.0`

.

The maximum entropy for `StrechedExponential`

is a rather complicated expression involving incomplete Gamma functions (see source code).

`ComplexityMeasures.TransferOperator`

— Type```
TransferOperator <: OutcomeSpace
TransferOperator(b::AbstractBinning; warn_precise = true, rng = Random.default_rng())
```

An `OutcomeSpace`

based on binning data into rectangular boxes dictated by the given binning scheme `b`

.

When used with `probabilities`

, then the transfer (Perron-Frobenius) operator is approximated over the bins, then bin probabilities are estimated as the invariant measure associated with that transfer operator. Assumes that the input data are sequential (time-ordered).

This implementation follows the grid estimator approach in Diego2019.

**Precision**

The default behaviour when using `RectangularBinning`

or `FixedRectangularBinning`

is to accept some loss of precision on the bin boundaries for speed-ups, but this may lead to issues for `TransferOperator`

where some points may be encoded as the symbol `-1`

("outside the binning"). The `warn_precise`

keyword controls whether the user is warned when a less precise binning is used.

**Outcome space**

The outcome space for `TransferOperator`

is the set of unique bins constructed from `b`

. Bins are identified by their left (lowest-value) corners, are given in data units, and are returned as `SVector`

s.

**Bin ordering**

Bins returned by `probabilities_and_outcomes`

are ordered according to first appearance (i.e. the first time the input (multivariate) timeseries visits the bin). Thus, if

```
b = RectangularBinning(4)
est = TransferOperator(b)
probs, outcomes = probabilities_and_outcomes(x, est) # x is some timeseries
```

then `probs[i]`

is the invariant measure (probability) of the bin `outcomes[i]`

, which is the `i`

-th bin visited by the timeseries with nonzero measure.

**Description**

The transfer operator $P^{N}$is computed as an `N`

-by-`N`

matrix of transition probabilities between the states defined by the partition elements, where `N`

is the number of boxes in the partition that is visited by the orbit/points.

If $\{x_t^{(D)} \}_{n=1}^L$ are the $L$ different $D$-dimensional points over which the transfer operator is approximated, $\{ C_{k=1}^N \}$ are the $N$ different partition elements (as dictated by `ϵ`

) that gets visited by the points, and $\phi(x_t) = x_{t+1}$, then

\[P_{ij} = \dfrac {\#\{ x_n | \phi(x_n) \in C_j \cap x_n \in C_i \}} {\#\{ x_m | x_m \in C_i \}},\]

where $\#$ denotes the cardinal. The element $P_{ij}$ thus indicates how many points that are initially in box $C_i$ end up in box $C_j$ when the points in $C_i$ are projected one step forward in time. Thus, the row $P_{ik}^N$ where $k \in \{1, 2, \ldots, N \}$ gives the probability of jumping from the state defined by box $C_i$ to any of the other $N$ states. It follows that $\sum_{k=1}^{N} P_{ik} = 1$ for all $i$. Thus, $P^N$ is a row/right stochastic matrix.

**Invariant measure estimation from transfer operator**

The left invariant distribution $\mathbf{\rho}^N$ is a row vector, where $\mathbf{\rho}^N P^{N} = \mathbf{\rho}^N$. Hence, $\mathbf{\rho}^N$ is a row eigenvector of the transfer matrix $P^{N}$ associated with eigenvalue 1. The distribution $\mathbf{\rho}^N$ approximates the invariant density of the system subject to `binning`

, and can be taken as a probability distribution over the partition elements.

In practice, the invariant measure $\mathbf{\rho}^N$ is computed using `invariantmeasure`

, which also approximates the transfer matrix. The invariant distribution is initialized as a length-`N`

random distribution which is then applied to $P^{N}$. For reproducibility in this step, set the `rng`

. The resulting length-`N`

distribution is then applied to $P^{N}$ again. This process repeats until the difference between the distributions over consecutive iterations is below some threshold.

See also: `RectangularBinning`

, `FixedRectangularBinning`

, `invariantmeasure`

.

`ComplexityMeasures.TransferOperatorApproximationRectangular`

— Type```
TransferOperatorApproximationRectangular(to, binning::RectangularBinning, mini,
edgelengths, bins, sort_idxs)
```

The `N`

-by-`N`

matrix `to`

is an approximation to the transfer operator, subject to the given `binning`

, computed over some set of sequentially ordered points.

For convenience, `mini`

and `edgelengths`

provide the minima and box edge lengths along each coordinate axis, as determined by applying `ϵ`

to the points. The coordinates of the (leftmost, if axis is ordered low-high) box corners are given in `bins`

.

Only bins actually visited by the points are considered, and `bins`

give the coordinates of these bins. The element `bins[i]`

correspond to the `i`

-th state of the system, which corresponds to the `i`

-th column/row of the transfer operator `to`

.

`sort_idxs`

contains the indices that would sort the input points. `visitors`

is a vector of vectors, where `visitors[i]`

contains the indices of the (sorted) points that visits `bins[i]`

.

See also: `RectangularBinning`

.

`ComplexityMeasures.Tsallis`

— Type```
Tsallis <: InformationMeasure
Tsallis(q; k = 1.0, base = 2)
Tsallis(; q = 1.0, k = 1.0, base = 2)
```

The Tsallis generalized order-`q`

entropy Tsallis1988, used with `information`

to compute an entropy.

`base`

only applies in the limiting case `q == 1`

, in which the Tsallis entropy reduces to `Shannon`

entropy.

**Description**

The Tsallis entropy is a generalization of the Boltzmann-Gibbs entropy, with `k`

standing for the Boltzmann constant. It is defined as

\[S_q(p) = \frac{k}{q - 1}\left(1 - \sum_{i} p[i]^q\right)\]

The maximum value of the Tsallis entropy is $k(L^{1 - q} - 1)/(1 - q)$, with $L$ the `total_outcomes`

.

`ComplexityMeasures.TsallisExtropy`

— Type```
TsallisExtropy <: InformationMeasure
TsallisExtropy(; base = 2)
```

The Tsallis extropy Xue2023.

**Description**

`TsallisExtropy`

is used with `information`

to compute

\[J_T(P) = k \dfrac{N - 1 - \sum_{i=1}^N ( 1 - p[i])^q}{q - 1}\]

for a probability distribution $P = \{p_1, p_2, \ldots, p_N\}$, with the $\log$ at the given `base`

. Alternatively, `TsallisExtropy`

can be used with `information_normalized`

, which ensures that the computed extropy is on the interval $[0, 1]$ by normalizing to to the maximal Tsallis extropy, given by

\[J_T(P) = \dfrac{(N - 1)N^{q - 1} - (N - 1)^q}{(q - 1)N^{q - 1}}\]

`ComplexityMeasures.UniqueElements`

— Type`UniqueElements()`

An `OutcomeSpace`

based on straight-forward counting of distinct elements in a univariate time series or multivariate dataset. This is the same as giving no estimator to `probabilities`

.

**Outcome space**

The outcome space is the unique sorted values of the input. Hence, input `x`

is needed for a well-defined `outcome_space`

.

**Implements**

`codify`

. Used for encoding inputs where ordering matters (e.g. time series).

`ComplexityMeasures.UniqueElementsEncoding`

— Type```
UniqueElementsEncoding <: Encoding
UniqueElementsEncoding(x)
```

`UniqueElementsEncoding`

is a generic encoding that encodes each `xᵢ ∈ unique(x)`

to one of the positive integers. The `xᵢ`

are encoded according to the order of their first appearance in the input data.

The constructor requires the input data `x`

, since the number of possible symbols is `length(unique(x))`

.

**Example**

```
using ComplexityMeasures
x = ['a', 2, 5, 2, 5, 'a']
e = UniqueElementsEncoding(x)
encode.(Ref(e), x) == [1, 2, 3, 2, 3, 1] # true
```

`ComplexityMeasures.ValueBinning`

— Type`ValueBinning(b::AbstractBinning) <: OutcomeSpace`

An `OutcomeSpace`

based on binning the values of the data as dictated by the binning scheme `b`

and formally computing their histogram, i.e., the frequencies of points in the bins. An alias to this is `VisitationFrequency`

. Available binnings are subtypes of `AbstractBinning`

.

The `ValueBinning`

estimator has a linearithmic time complexity (`n log(n)`

for `n = length(x)`

) and a linear space complexity (`l`

for `l = dimension(x)`

). This allows computation of probabilities (histograms) of high-dimensional datasets and with small box sizes `ε`

without memory overflow and with maximum performance. For performance reasons, the probabilities returned never contain 0s and are arbitrarily ordered.

`ValueBinning(ϵ::Union{Real,Vector})`

A convenience method that accepts same input as `RectangularBinning`

and initializes this binning directly.

**Outcomes**

The outcome space for `ValueBinning`

is the unique bins constructed from `b`

. Each bin is identified by its left (lowest-value) corner, because bins are always left-closed-right-open intervals `[a, b)`

. The bins are in data units, not integer (cartesian indices units), and are returned as `SVector`

s, i.e., same type as input data.

For convenience, `outcome_space`

returns the outcomes in the same array format as the underlying binning (e.g., `Matrix`

for 2D input).

For `FixedRectangularBinning`

the `outcome_space`

is well-defined from the binning, but for `RectangularBinning`

input `x`

is needed as well.

**Implements**

`codify`

. Used for encoding inputs where ordering matters (e.g. time series).

`ComplexityMeasures.ValueHistogram`

— Type`ValueHistogram`

An alias for `ValueBinning`

.

`ComplexityMeasures.Vasicek`

— Type```
Vasicek <: DifferentialInfoEstimator
Vasicek(definition = Shannon(); m::Int = 1)
```

The `Vasicek`

estimator computes the `Shannon`

differential `information`

of a timeseries using the method from Vasicek1976, with logarithms to the `base`

specified in `definition`

.

The `Vasicek`

estimator belongs to a class of differential entropy estimators based on order statistics, of which Vasicek1976 was the first. It only works for *timeseries* input.

**Description**

Assume we have samples $\bar{X} = \{x_1, x_2, \ldots, x_N \}$ from a continuous random variable $X \in \mathbb{R}$ with support $\mathcal{X}$ and density function$f : \mathbb{R} \to \mathbb{R}$. `Vasicek`

estimates the `Shannon`

differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]

However, instead of estimating the above integral directly, it makes use of the equivalent integral, where $F$ is the distribution function for $X$,

\[H(X) = \int_0^1 \log \left(\dfrac{d}{dp}F^{-1}(p) \right) dp\]

This integral is approximated by first computing the order statistics of $\bar{X}$ (the input timeseries), i.e. $x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)}$. The `Vasicek`

`Shannon`

differential entropy estimate is then

\[\hat{H}_V(\bar{X}, m) = \dfrac{1}{n} \sum_{i = 1}^n \log \left[ \dfrac{n}{2m} (\bar{X}_{(i+m)} - \bar{X}_{(i-m)}) \right]\]

**Usage**

In practice, choice of `m`

influences how fast the entropy converges to the true value. For small value of `m`

, convergence is slow, so we recommend to scale `m`

according to the time series length `n`

and use `m >= n/100`

(this is just a heuristic based on the tests written for this package).

See also: `information`

, `Correa`

, `AlizadehArghami`

, `Ebrahimi`

, `DifferentialInfoEstimator`

.

`ComplexityMeasures.VisitationFrequency`

— Type`VisitationFrequency`

An alias for `ValueBinning`

.

`ComplexityMeasures.WaveletOverlap`

— Type`WaveletOverlap([wavelet]) <: OutcomeSpace`

An `OutcomeSpace`

based on the maximal overlap discrete wavelet transform (MODWT).

When used with `probabilities`

, the MODWT is applied to a signal, then probabilities are computed as the (normalized) energies at different wavelet scales. These probabilities are used to compute the wavelet entropy according to Rosso2001. Input timeseries `x`

is needed for a well-defined outcome space.

By default the wavelet `Wavelets.WT.Daubechies{12}()`

is used. Otherwise, you may choose a wavelet from the `Wavelets`

package (it must subtype `OrthoWaveletClass`

).

**Outcome space**

The outcome space for `WaveletOverlap`

are the integers `1, 2, …, N`

enumerating the wavelet scales. To obtain a better understanding of what these mean, we prepared a notebook you can view online. As such, this estimator only works for timeseries input and input `x`

is needed for a well-defined `outcome_space`

.

`ComplexityMeasures.WeightedOrdinalPatterns`

— Type```
WeightedOrdinalPatterns <: OutcomeSpace
WeightedOrdinalPatterns{m}(τ = 1, lt::Function = ComplexityMeasures.isless_rand)
```

A variant of `OrdinalPatterns`

that also incorporates amplitude information, based on the weighted permutation entropy Fadlallah2013. The outcome space and arguments are the same as in `OrdinalPatterns`

.

**Description**

For each ordinal pattern extracted from each state (or delay) vector, a weight is attached to it which is the variance of the vector. Probabilities are then estimated by summing the weights corresponding to the same pattern, instead of just counting the occurrence of the same pattern.

*Note: in equation 7, section III, of the original paper, the authors write*

\[w_j = \dfrac{1}{m}\sum_{k=1}^m (x_{j-(k-1)\tau} - \mathbf{\hat{x}}_j^{m, \tau})^2.\]

*But given the formula they give for the arithmetic mean, this is **not** the variance of the delay vector $\mathbf{x}_i$, because the indices are mixed: $x_{j+(k-1)\tau}$ in the weights formula, vs. $x_{j+(k+1)\tau}$ in the arithmetic mean formula. Here, delay embedding and computation of the patterns and their weights are completely separated processes, ensuring that we compute the arithmetic mean correctly for each vector of the input dataset (which may be a delay-embedded timeseries).

`ComplexityMeasures.Zhu`

— Type```
Zhu <: DifferentialInfoEstimator
Zhu(; definition = Shannon(), k = 1, w = 0)
```

The `Zhu`

estimator Zhu2015 is an extension to `KozachenkoLeonenko`

, and computes the `Shannon`

differential `information`

of a multi-dimensional `StateSpaceSet`

, with logarithms to the `base`

specified in `definition`

.

**Description**

Assume we have samples $\{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ from a continuous random variable $X \in \mathbb{R}^d$ with support $\mathcal{X}$ and density function$f : \mathbb{R}^d \to \mathbb{R}$. `Zhu`

estimates the `Shannon`

differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))]\]

by approximating densities within hyperrectangles surrounding each point `xᵢ ∈ x`

using using `k`

nearest neighbor searches. `w`

is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to `0`

, meaning that only the point itself is excluded when searching for neighbours).

See also: `information`

, `KozachenkoLeonenko`

, `DifferentialInfoEstimator`

.

`ComplexityMeasures.ZhuSingh`

— Type```
ZhuSingh <: DifferentialInfoEstimator
ZhuSingh(definition = Shannon(); k = 1, w = 0)
```

The `ZhuSingh`

estimator Zhu2015 computes the `Shannon`

differential `information`

of a multi-dimensional `StateSpaceSet`

, with logarithms to the `base`

specified in `definition`

.

**Description**

Assume we have samples $\{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ from a continuous random variable $X \in \mathbb{R}^d$ with support $\mathcal{X}$ and density function$f : \mathbb{R}^d \to \mathbb{R}$. `ZhuSingh`

estimates the `Shannon`

differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]

Like `Zhu`

, this estimator approximates probabilities within hyperrectangles surrounding each point `xᵢ ∈ x`

using using `k`

nearest neighbor searches. However, it also considers the number of neighbors falling on the borders of these hyperrectangles. This estimator is an extension to the entropy estimator in Singh2003.

`w`

is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to `0`

, meaning that only the point itself is excluded when searching for neighbours).

See also: `information`

, `DifferentialInfoEstimator`

.

`ComplexityMeasures.AAPE`

— Function`AAPE(x, A::Real = 0.5, m::Int = length(x))`

Encode relative amplitude information of the elements of `a`

.

`A = 1`

emphasizes only average values.`A = 0`

emphasizes changes in amplitude values.`A = 0.5`

equally emphasizes average values and changes in the amplitude values.

`ComplexityMeasures.allcounts_and_outcomes`

— Method`allcounts_and_outcomes(o::OutcomeSpace, x::Array_or_SSSet) → (cts::Counts{<:Integer, 1}, Ω)`

Like `counts_and_outcomes`

, but ensures that *all* outcomes `Ωᵢ ∈ Ω`

, where `Ω = outcome_space(o, x)`

), are included.

Outcomes that do not occur in the data `x`

get a 0 count.

`ComplexityMeasures.allprobabilities_and_outcomes`

— Method```
allprobabilities_and_outcomes(est::ProbabilitiesEstimator, x::Array_or_SSSet) → (p::Probabilities, outs)
allprobabilities_and_outcomes(o::OutcomeSpace, x::Array_or_SSSet) → (p::Probabilities, outs)
```

The same as `probabilities_and_outcomes`

, but ensures that outcomes with `0`

probability are explicitly added in the returned vector. This means that `p[i]`

is the probability of `ospace[i]`

, with `ospace =`

`outcome_space`

`(est, x)`

.

This function is useful in cases where one wants to compare the probability mass functions of two different input data `x, y`

under the same estimator. E.g., to compute the KL-divergence of the two PMFs assumes that the obey the same indexing. This is not true for `probabilities`

even with the same `est`

, due to the skipping of 0 entries, but it is true for `allprobabilities_and_outcomes`

.

`ComplexityMeasures.apply_multiscale`

— Function`apply_multiscale(alg::MultiScaleAlgorithm, f::Function, args...)`

Define multiscale dispatch for the function `f`

(either `information`

, `complexity`

or their normalized variants) to downsampled timeseries resulting from coarse-graining `last(args)`

(the input data) using coarse-graining algorithm `alg`

with arguments `args[1:end-1]`

(the estimation parameters).

`ComplexityMeasures.ball_volume`

— MethodVolume of a unit ball in R^d.

`ComplexityMeasures.base10_to_factorial`

— Function```
base10_to_factorial(s::Int,
ndigits::Int = ndigits_in_factorial_base(s)) → f::SVector{ndigits, Int}
```

Convert a base-10 integer to its factorial number system representation. `f`

is a vector where `f[k]`

is the multiplier of `factorial(k - 1)`

.

For example, the base-10 integer `567`

, in the factorial number system, is $4\cdot 5! + 3\cdot 4! + 2\cdot 3! + 1\cdot 2! + 1\cdot 1! + 0\cdot 0!$. For this example, `base10_to_factorial`

would return the `SVector`

`[4, 3, 2, 1, 1, 0]`

.

`ndigits`

fixes the number of digits in `f`

(this just prepends a zero to `f`

for each extraneous radix/base). This is useful when using factorial number for decoding Lehmer codes into permutations

`ComplexityMeasures.cartesian_bin_index`

— Method`cartesian_bin_index(e::RectangularBinEncoding, point::SVector)`

Return the cartesian index of the given `point`

within the binning encapsulated in `e`

. Internal function called by `encode`

.

`ComplexityMeasures.center_neighborhood!`

— Method`center_neighborhood!(C, c, xᵢ, neighbors)`

Center the point `xᵢ`

, as well as each of its neighboring points `nⱼ ∈ neighbors`

, to the (precomputed) centroid `c`

of the points `{xᵢ, n₁, n₂, …, nₖ}`

, and store the centered vectors in the pre-allocated vector of vectors `C`

.

`ComplexityMeasures.codify`

— Function```
codify(o::OutcomeSpace, x::Vector) → s::Vector{Int}
codify(o::OutcomeSpace, x::AbstractStateSpaceSet{D}) → s::NTuple{D, Vector{Int}
```

Codify `x`

according to the outcome space `o`

. If `x`

is a `Vector`

, then a `Vector{<:Integer}`

is returned. If `x`

is a `StateSpaceSet{D}`

, then symbolization is done column-wise and an `NTuple{D, Vector{<:Integer}}`

is returned, where `D = dimension(x)`

.

**Description**

The reason this function exists is that we don't always want to `encode`

the entire input `x`

at once. Sometimes, it is desirable to first apply some transformation to `x`

first, then apply `Encoding`

s in a point-wise manner in the transformed space. (the `OutcomeSpace`

dictates this transformation). This is useful for encoding timeseries data.

The length of the returned `s`

depends on the `OutcomeSpace`

. Some outcome spaces preserve the input data length (e.g. `UniqueElements`

), while some outcome spaces (e.g. `OrdinalPatterns`

) do e.g. delay embeddings before encoding, so that `length(s) < length(x)`

.

`ComplexityMeasures.complexity`

— Method`complexity(c::ComplexityEstimator, x) → m::Real`

Estimate a complexity measure according to `c`

for input data `x`

, where `c`

is an instance of any subtype of `ComplexityEstimator`

:

`ComplexityMeasures.complexity_normalized`

— Method`complexity_normalized(c::ComplexityEstimator, x) → m::Real ∈ [a, b]`

The same as `complexity`

, but the result is normalized to the interval `[a, b]`

, where `[a, b]`

depends on `c`

.

`ComplexityMeasures.compute_ϕ`

— Method```
compute_ϕ(x::AbstractVector{T}; r = 0.2 * Statistics.std(x), k::Int = 2,
τ::Int = 1, base = MathConstants.e) where T <: Real
```

Construct the embedding

\[u = \{{\bf u}_n \}_{n = 1}^{N - k + 1} = \{[x(i), x(i + 1), \ldots, x(i + k - 1)]\}_{n = 1}^{N - k + 1}\]

and use a tree-and-nearest-neighbor search approach to compute

\[\phi^k(r) = \dfrac{1}{N - kτ + 1} \sum_{i}^{N - kτ + 1} \log_{b}{(C_i^k(r))},\]

taking logarithms to `base`

$b$, and where

\[C_i^k(r) = \textrm{number of } j \textrm{ such that } d({\bf u}_i, {\bf u}_j) < r,\]

where $d$ is the maximum (Chebyshev) distance, `r`

is the tolerance, and `N`

is the length of the original scalar-valued time series `x`

.

`ComplexityMeasures.convert_logunit`

— Method`convert_logunit(h_a::Real, base_from, base_to) → h_b`

Convert a number `h_a`

computed with logarithms to base `base_from`

to an entropy `h_b`

computed with logarithms to base `base_to`

. This can be used to convert the "unit" of an entropy.

`ComplexityMeasures.counts`

— Function`counts(o::OutcomeSpace, x) → cts::Counts`

Compute the same counts as in the `counts_and_outcomes`

function, with two differences:

- Do not explicitly return the outcomes.
- If the outcomes are not estimated for free while estimating the counts, a special integer type is used to enumerate the outcomes, to avoid the computational cost of estimating the outcomes.

`ComplexityMeasures.counts_and_outcomes`

— Method`counts_and_outcomes(o::OutcomeSpace, x) → (cts::Counts, Ω)`

Discretize/encode `x`

(which must be sortable) into a finite set of outcomes `Ω`

specified by the provided `OutcomeSpace`

`o`

, and then count how often each outcome `Ωᵢ ∈ Ω`

(i.e. each "discretized value", or "encoded symbol") appears.

Return a tuple where the first element is a `Counts`

instance, which is vector-like and contains the counts, and where the second element `Ω`

are the outcomes corresponding to the counts, such that `cts[i]`

is the count for the outcome `Ω[i]`

.

The outcomes are actually included in `cts`

, and you can use the `outcomes`

function on the `cts`

to get them. `counts_and_outcomes`

returns both for backwards compatibility.

`counts_and_outcomes(x) → cts::Counts`

If no `OutcomeSpace`

is specified, then `UniqueElements`

is used as the outcome space.

**Description**

For `OutcomeSpace`

s that uses `encode`

to discretize, it is possible to count how often each outcome $\omega_i \in \Omega$, where $\Omega$ is the set of possible outcomes, is observed in the discretized/encoded input data. Thus, we can assign to each outcome $\omega_i$ a count $f(\omega_i)$, such that $\sum_{i=1}^N f(\omega_i) = N$, where $N$ is the number of observations in the (encoded) input data. `counts`

returns the counts $f(\omega_i)_{obs}$ and outcomes only for the *observed* outcomes $\omega_i^{obs}$ (those outcomes that actually appear in the input data). If you need the counts for *unobserved* outcomes as well, use `allcounts_and_outcomes`

.

`ComplexityMeasures.decode`

— Function`decode(c::Encoding, i::Integer) -> ω`

Decode an encoded element `i`

into the outcome `ω ∈ Ω`

it corresponds to. `Ω`

is the `outcome_space`

that uses encoding `c`

.

`ComplexityMeasures.distance_to_whitenoise`

— Method```
distance_to_whitenoise(estimator::ReverseDispersion, p::Probabilities;
normalize = false)
```

Compute the distance of the probability distribution `p`

from a uniform distribution, given the parameters of `estimator`

(which must be known beforehand).

If `normalize == true`

, then normalize the value to the interval `[0, 1]`

by using the parameters of `estimator`

.

Used to compute reverse dispersion entropy(`ReverseDispersion`

; Li et al., 2019Li2019).

`ComplexityMeasures.downsample`

— Method`downsample(algorithm::MultiScaleAlgorithm, s::Int, x)`

Downsample and coarse-grain `x`

to scale `s`

according to the given `MultiScaleAlgorithm`

. The return type depends on `algorithm`

.

`ComplexityMeasures.encode`

— Function`encode(c::Encoding, χ) -> i::Int`

Encode an element `χ ∈ x`

of input data `x`

(those given to e.g., `counts`

) into the **positive integers** using encoding `c`

. The special value of `i = -1`

is used as a return value for inappropriate elements `χ`

that cannot be encoded according to `c`

.

`ComplexityMeasures.entropy`

— Method`entropy(args...)`

`entropy`

is nothing more than a call to `information`

that will simply throw an error if used with an information measure that is not an entropy.

`ComplexityMeasures.entropy_approx`

— Method`entropy_approx(x; m = 2, τ = 1, r = 0.2 * Statistics.std(x), base = MathConstants.e)`

Convenience syntax for computing the approximate entropy (Pincus, 1991) for timeseries `x`

.

This is just a wrapper for `complexity(ApproximateEntropy(; m, τ, r, base), x)`

(see also `ApproximateEntropy`

).

`ComplexityMeasures.entropy_complexity`

— Method`entropy_complexity(c::StatisticalComplexity, x) → (h, compl)`

Return a information measure `h`

and the corresponding `StatisticalComplexity`

value `compl`

.

Useful when wanting to plot data on the "entropy-complexity plane". See also `entropy_complexity_curves`

.

`ComplexityMeasures.entropy_complexity_curves`

— Method```
entropy_complexity_curves(c::StatisticalComplexity;
num_max=1, num_min=1000) -> (min_entropy_complexity, max_entropy_complexity)
```

Calculate the maximum complexity-entropy curve for the statistical complexity according to Rosso2007 for `num_max * total_outcomes(c.o)`

different values of the normalized information measure of choice (in case of the maximum complexity curves) and `num_min`

different values of the normalized information measure of choice (in case of the minimum complexity curve).

This function can also be used to compute the maximum "complexity-extropy curve" if `c.hest`

is e.g. `ShannonExtropy`

, which is the equivalent of the complexity-entropy curves, but using extropy instead of entropy.

**Description**

The way the statistical complexity is designed, there is a minimum and maximum possible complexity for data with a given value of an information measure. The calculation time of the maximum complexity curve grows as `O(total_outcomes(c.o)^2)`

, and thus takes very long for high numbers of outcomes. This function is inspired by S. Sippels implementation in statcomp Sippel2016.

This function will work with any `ProbabilitiesEstimator`

where `total_outcomes`

is known a priori.

`ComplexityMeasures.entropy_dispersion`

— Method`entropy_dispersion(x; base = 2, kwargs...)`

Compute the dispersion entropy. This function is just a convenience call to:

```
est = Dispersion(kwargs...)
information(Shannon(base), est, x)
```

See `Dispersion`

for more info.

`ComplexityMeasures.entropy_distribution`

— Method`entropy_distribution(x; τ = 1, m = 3, n = 3, base = 2)`

Compute the distribution entropy Li2015 of `x`

using embedding dimension `m`

with delay/lag `τ`

, using the Chebyshev distance metric, and using an `n`

-element equally-spaced binning over the distribution of distances to estimate probabilities.

This function is just a convenience call to:

```
x = rand(1000000)
o = SequentialPairDistances(x, n, m, τ, metric = Chebyshev())
h = information(Shannon(base = 2), o, x)
```

See `SequentialPairDistances`

for more info.

`ComplexityMeasures.entropy_permutation`

— Method`entropy_permutation(x; τ = 1, m = 3, base = 2)`

Compute the permutation entropy of `x`

of order `m`

with delay/lag `τ`

. This function is just a convenience call to:

```
est = OrdinalPatterns(; m, τ)
information(Shannon(base), est, x)
```

See `OrdinalPatterns`

for more info. Similarly, one can use `WeightedOrdinalPatterns`

or `AmplitudeAwareOrdinalPatterns`

for the weighted/amplitude-aware versions.

`ComplexityMeasures.entropy_sample`

— Method`entropy_sample(x; r = 0.2std(x), m = 2, τ = 1, normalize = true)`

Convenience syntax for estimating the (normalized) sample entropy (Richman & Moorman, 2000) of timeseries `x`

.

This is just a wrapper for `complexity(SampleEntropy(; r, m, τ, base), x)`

.

See also: `SampleEntropy`

, `complexity`

, `complexity_normalized`

).

`ComplexityMeasures.entropy_wavelet`

— Method`entropy_wavelet(x; wavelet = Wavelets.WT.Daubechies{12}(), base = 2)`

Compute the wavelet entropy. This function is just a convenience call to:

```
est = WaveletOverlap(wavelet)
information(Shannon(base), est, x)
```

See `WaveletOverlap`

for more info.

`ComplexityMeasures.fasthist!`

— Method`fasthist!(x) → c::Vector{Int}`

Count the occurrences `c`

of the unique data values in `x`

, so that `c[i]`

is the number of times the value `sort!(unique(x))[i]`

occurs. Hence, this method is useful mostly when `x`

contains integer or categorical data.

Prior to counting, `x`

is sorted, so this function also mutates `x`

. Therefore, it is called with `copy`

in higher level API when necessary. This function works for any `x`

for which `sort!(x)`

works.

`ComplexityMeasures.fasthist!`

— Method`fasthist!(x, weights) → c::Vector{Real}`

Similar to `fasthist!(x)`

, but here the `weights`

are summed up for each unique entry of `x`

. `x`

is sorted just like in `fasthist!(x)`

.

`ComplexityMeasures.fasthist`

— Method`fasthist(c::RectangularBinEncoding, x::Vector_or_SSSet)`

Intermediate method that runs `fasthist!`

in the encoded space and returns the encoded space histogram (counts) and corresponding bins. Also skips any instances of out-of-bound points for the histogram.

`ComplexityMeasures.hidefields`

— Method`hidefields(::Type{T})`

Returns an iterable of symbols incidating fields to hide for instances of type `T`

.

`ComplexityMeasures.information`

— Method`information(est::DifferentialInfoEstimator, x) → h::Real`

Estimate a **differential information measure** using the provided `DifferentialInfoEstimator`

and input data `x`

.

**Description**

The overwhelming majority of differential estimators estimate the `Shannon`

entropy. If the same estimator can estimate different information measures (e.g. it can estimate both `Shannon`

and `Tsallis`

), then the information measure is provided as an argument to the estimator itself.

See the table of differential information measure estimators in the docs for all differential information measure estimators.

Currently, unlike for the discrete information measures, this method doesn't involve explicitly first computing a probability density function and then passing this density to an information measure definition. But in the future, we want to establish a `density`

API similar to the `probabilities`

API.

**Examples**

To compute the differential version of a measure, give it as the first argument to a `DifferentialInfoEstimator`

and pass it to `information`

.

```
x = randn(1000)
h_sh = information(Kraskov(Shannon()), x)
h_vc = information(Vasicek(Shannon()), x)
```

A normal distribution has a base-e Shannon differential entropy of `0.5*log(2π) + 0.5`

nats.

```
est = Kraskov(k = 5, base = ℯ) # Base `ℯ` for nats.
h = information(est, randn(2_000_000))
abs(h - 0.5*log(2π) - 0.5) # ≈ 0.0001
```

`ComplexityMeasures.information`

— Method```
information([die::DiscreteInfoEstimator,] [est::ProbabilitiesEstimator,] o::OutcomeSpace, x) → h::Real
information(o::OutcomeSpace, x) → h::Real
```

Estimate a discrete information measure from input data `x`

using the provided `DiscreteInfoEstimator`

and `ProbabilitiesEstimator`

over the given `OutcomeSpace`

.

As an alternative, you can provide an `InformationMeasure`

for the first argument (`die`

) which will default to `PlugIn`

estimation) for the information estimation. You may also skip the first argument (`die`

), in which case `Shannon()`

will be used. You may also skip the second argument (`est`

), which will default to the `RelativeAmount`

probabilities estimator. Note that some information measure estimators (e.g., `GeneralizedSchuermann`

) operate directly on counts and hence ignore `est`

.

```
information([e::DiscreteInfoEstimator,] p::Probabilities) → h::Real
information([e::DiscreteInfoEstimator,] c::Counts) → h::Real
```

Like above, but estimate the information measure from the pre-computed `Probabilities`

`p`

or `Counts`

. Counts are converted into probabilities using `RelativeAmount`

, unless the estimator `e`

uses counts directly.

See also: `information_maximum`

, `information_normalized`

for a normalized version.

**Examples (naive estimation)**

The simplest way to estimate a discrete measure is to provide the `InformationMeasure`

directly in combination with an `OutcomeSpace`

. This will use the "naive" `PlugIn`

estimator for the measure, and the "naive" `RelativeAmount`

estimator for the probabilities.

```
x = randn(100) # some input data
o = ValueBinning(RectangularBinning(5)) # a 5-bin histogram outcome space
h_s = information(Shannon(), o, x)
```

Here are some more examples:

```
x = [rand(Bool) for _ in 1:10000] # coin toss
ps = probabilities(x) # gives about [0.5, 0.5] by definition
h = information(ps) # gives 1, about 1 bit by definition (Shannon entropy by default)
h = information(Shannon(), ps) # syntactically equivalent to the above
h = information(Shannon(), UniqueElements(), x) # syntactically equivalent to above
h = information(Renyi(2.0), ps) # also gives 1, order `q` doesn't matter for coin toss
h = information(OrdinalPatterns(;m=3), x) # gives about 2, again by definition
```

**Examples (bias-corrected estimation)**

It is known that both `PlugIn`

estimation for information measures and `RelativeAmount`

estimation for probabilities are biased. The scientific literature abounds with estimators that correct for this bias, both on the measure-estimation level and on the probability-estimation level. We thus provide the option to use any `DiscreteInfoEstimator`

in combination with any `ProbabilitiesEstimator`

for improved estimates. Note that custom probabilites estimators will only work with counting-compatible `OutcomeSpace`

.

```
x = randn(100)
o = ValueBinning(RectangularBinning(5))
# Estimate Shannon entropy estimation using various dedicated estimators
h_s = information(MillerMadow(Shannon()), RelativeAmount(), o, x)
h_s = information(HorvitzThompson(Shannon()), Shrinkage(), o, x)
h_s = information(Schuermann(Shannon()), Shrinkage(), o, x)
# Estimate information measures using the generic `Jackknife` estimator
h_r = information(Jackknife(Renyi()), Shrinkage(), o, x)
j_t = information(Jackknife(TsallisExtropy()), BayesianRegularization(), o, x)
j_r = information(Jackknife(RenyiExtropy()), RelativeAmount(), o, x)
```

`ComplexityMeasures.information_maximum`

— Method`information_maximum(e::InformationMeasure, o::OutcomeSpace [, x])`

Return the maximum value of the given information measure can have, given input data `x`

and the given outcome space (the `OutcomeSpace`

may also be specified by a `ProbabilitiesEstimator`

).

Like in `outcome_space`

, for some outcome spaces, the possible outcomes are known without knowledge of input `x`

, in which case the function dispatches to `information_maximum(e, o)`

.

`information_maximum(e::InformationMeasure, L::Int)`

The same as above, but computed directly from the number of total outcomes `L`

.

`ComplexityMeasures.information_normalized`

— Method`information_normalized([e::DiscreteInfoEstimator,] [est::ProbabilitiesEstimator,] o::OutcomeSpace, x) → h::Real`

Estimate the normalized version of the given discrete information measure, This is just the value of `information`

divided its maximum possible value given `o`

.

The same convenience syntaxes as in `information`

can be used here.

Notice that there is no method `information_normalized(e::DiscreteInfoEstimator, probs::Probabilities)`

, because there is no way to know the number of *possible* outcomes (i.e., the `total_outcomes`

) from `probs`

.

**Normalized values**

For the `PlugIn`

estimator, it is guaranteed that `h̃ ∈ [0, 1]`

. For any other estimator, we can't guarantee this, since the estimator might over-correct. You should know what you're doing if using anything but `PlugIn`

to estimate normalized values.

`ComplexityMeasures.invariantmeasure`

— Method```
invariantmeasure(x::AbstractStateSpaceSet, binning::RectangularBinning;
rng = Random.default_rng()) → iv::InvariantMeasure
```

Estimate an invariant measure over the points in `x`

based on binning the data into rectangular boxes dictated by the `binning`

, then approximate the transfer (Perron-Frobenius) operator over the bins. From the approximation to the transfer operator, compute an invariant distribution over the bins. Assumes that the input data are sequential.

Details on the estimation procedure is found the `TransferOperator`

docstring.

**Example**

```
using DynamicalSystems
henon_rule(x, p, n) = SVector{2}(1.0 - p[1]*x[1]^2 + x[2], p[2]*x[1])
henon = DeterministicIteratedMap(henon_rule, zeros(2), [1.4, 0.3])
orbit, t = trajectory(ds, 20_000; Ttr = 10)
# Estimate the invariant measure over some coarse graining of the orbit.
iv = invariantmeasure(orbit, RectangularBinning(15))
# Get the probabilities and bins
invariantmeasure(iv)
```

**Probabilities and bin information**

`invariantmeasure(iv::InvariantMeasure) → (ρ::Probabilities, bins::Vector{<:SVector})`

From a pre-computed invariant measure, return the probabilities and associated bins. The element `ρ[i]`

is the probability of visitation to the box `bins[i]`

.

Why bother with the transfer operator instead of using regular histograms to obtain probabilities?

In fact, the naive histogram approach and the transfer operator approach are equivalent in the limit of long enough time series (as $n \to \intfy$), which is guaranteed by the ergodic theorem. There is a crucial difference, however:

The naive histogram approach only gives the long-term probabilities that orbits visit a certain region of the state space. The transfer operator encodes that information too, but comes with the added benefit of knowing the *transition probabilities* between states (see `transfermatrix`

).

See also: `InvariantMeasure`

.

`ComplexityMeasures.is_counting_based`

— Method`is_counting_based(o::OutcomeSpace)`

Return `true`

if the `OutcomeSpace`

`o`

is counting-based, and `false`

otherwise.

`ComplexityMeasures.ith_order_statistic`

— Function`ith_order_statistic(ex, i::Int, n::Int = length(x))`

Return the i-th order statistic from the order statistics `ex`

, requiring that `Xᵢ = X₁`

if `i < 1`

and `Xᵢ = Xₙ`

if `i > n`

.

`ComplexityMeasures.log_with_base`

— Method`log_with_base(base) → f`

Return a function that computes the logarithm at a given base. This definitely increases accuracy, and probably also performance.

`ComplexityMeasures.maxdists`

— Method`maxdists(xᵢ, nns) → dists`

Compute the maximum distance from `xᵢ`

to the points `xⱼ ∈ nns`

along each dimension, i.e. `dists[k] = max{xᵢ[k], xⱼ[k]}`

for `j = 1, 2, ..., length(x)`

.

`ComplexityMeasures.missing_outcomes`

— Method`missing_outcomes(o::OutcomeSpace, x; all = false) → n::Int`

Count the number of missing outcomes `n`

(i.e., not occuring in the data) specified by `o`

, given input data `x`

. This function only works for count-based outcome spaces, use `missing_probabilities`

otherwise.

See also: `MissingDispersionPatterns`

.

`ComplexityMeasures.missing_probabilities`

— Method`missing_probabilities([est::ProbabilitiesEstimator], o::OutcomeSpace, x)`

Same as `missing_outcomes`

, but defines a "missing outcome" as an outcome having 0 probability according to `est`

.

`ComplexityMeasures.multiscale`

— Function`multiscale(algorithm::MultiScaleAlgorithm, [args...], x)`

A convenience function to compute the multiscale version of any `InformationMeasureEstimator`

or `ComplexityEstimator`

The return type of `multiscale`

is either a `Vector{Real}`

or a `Vector{Vector{Real}}`

, see the available coarse-graining methods below.

It utilizes `downsample`

with the given `algorithm`

to first produce coarse-grained, downsampled versions of `x`

for scale factors `algorithm.scales`

. Then, `information`

or `complexity`

, depending on the input arguments, is applied to each of the coarse-grained timeseries. If `N = length(x)`

, then the length of the most severely downsampled version of `x`

is `N ÷ maximum(algorithm.scales)`

, while for scale factor `1`

, the original time series is considered.

**Description**

This function generalizes the multiscale entropy of Costa2002 to any discrete information measure, any differential information measure, and any other complexity measure.

**Coarse-graining algorithms**

The available downsampling routines are:

`RegularDownsampling`

yields a single`Vector`

per scale.`CompositeDownsampling`

yields a`Vector{Vector}`

per scale.

**Examples**

`multiscale`

can be used with any discrete or differential information measure estimator. For example, here's two ways of computing multiscale Tsallis entropy:

```
using ComplexityMeasures
x = randn(1000)
downsampling = RegularDownsampling(scales = 1:5) # multiscale algorithm
# Symbolic (ordinal-pattern-based) probabilities estimation using Bayesian regularization,
# jackknife estimation of the entropy.
o = OrdinalPatterns{3}(2) # outcome space
probest = BayesianRegularization() # probabilities estimator
hest = Jackknife(Tsallis(q = 1.5)) # entropy estimator
multiscale(downsampling, hest, probest, o, x)
# Differential kNN-based estimator:
hest = LeonenkoProzantoSavani(Tsallis(q = 1.5), k = 10) # 10 neighbors
multiscale(downsampling, hest, x)
```

Multiscale variants of any `ComplexityEstimator`

are also trivial to compute. Let's compute the "generalized multiscale sample entropy Costa2015" using the second-order moment.

```
using ComplexityMeasures, Statistics
multiscale(CompositeDownsampling(; f = Statistics.var), SampleEntropy(x), x)
```

`ComplexityMeasures.multiscale_normalized`

— Function`multiscale_normalized(algorithm::MultiScaleAlgorithm, [args...], x)`

The same as `multiscale`

, but computes the normalized version of the complexity measure.

`ComplexityMeasures.n_borderpoints`

— Method`n_borderpoints(xᵢ, nns, dists) → ξ`

Compute `ξ`

, which is how many of `xᵢ`

's neighbor points `xⱼ ∈ nns`

fall on the border of the minimal-volume rectangle with `xᵢ`

at its center.

`dists[k]`

should be the maximum distance from `xᵢ[k]`

to any other point along the k-th dimension, and `length(dists)`

is the total dimension.

`ComplexityMeasures.ndigits_in_factorial_base`

— MethodCompute how many digits a base-10 integer needs in the factorial number system.

`ComplexityMeasures.outcome_space`

— Method`outcome_space(o::OutcomeSpace, x) → Ω`

Return a sorted container containing all *possible* outcomes of `o`

for input `x`

.

For some estimators the concrete outcome space is known without knowledge of input `x`

, in which case the function dispatches to `outcome_space(o)`

. In general it is recommended to use the 2-argument version irrespectively of estimator.

`ComplexityMeasures.outcomes`

— Method`outcomes(o::OutcomeSpace, x)`

Return all (unique) outcomes that appear in the (encoded) input data `x`

, according to the given `OutcomeSpace`

. Equivalent to `probabilities_and_outcomes(o, x)[2]`

, but for some estimators it may be explicitly extended for better performance.

`ComplexityMeasures.print_array_with_margins`

— Method`print_array_with_margins(io::IO, x::AbstractArray{T, 1}, margin::AbstractVector) where T`

Prints a length-`N`

vector `x`

alongside the `margin`

elements, such that `x[i]`

corresponds to the marginal element `margin[i]`

.

`ComplexityMeasures.probabilities`

— Function```
probabilities(
[est::ProbabilitiesEstimator], o::OutcomeSpace, x::Array_or_SSSet
) → p::Probabilities
```

Compute the same probabilities as in the `probabilities_and_outcomes`

function, with two differences:

- Do not explicitly return the outcomes.
- If the outcomes are not estimated for free while estimating the counts, a special integer type is used to enumerate the outcomes, to avoid the computational cost of estimating the outcomes.

`probabilities([est::ProbabilitiesEstimator], counts::Counts) → (p::Probabilities, Ω)`

The same as above, but estimate the probability directly from a set of `Counts`

.

`ComplexityMeasures.probabilities!`

— Function`probabilities!(s, args...)`

Similar to `probabilities(args...)`

, but allows pre-allocation of temporarily used containers `s`

.

Only works for certain estimators. See for example `OrdinalPatterns`

.

`ComplexityMeasures.probabilities_and_outcomes`

— Function```
probabilities_and_outcomes(
[est::ProbabilitiesEstimator], o::OutcomeSpace, x::Array_or_SSSet
) → (p::Probabilities, Ω)
```

Estimate a probability distribution over the set of possible outcomes `Ω`

defined by the `OutcomeSpace`

`o`

, given input data `x`

. Probabilities are estimated according to the given probabilities estimator `est`

, which defaults to `RelativeAmount`

.

The input data is typically an `Array`

or a `StateSpaceSet`

(or `SSSet`

for short); see Input data for ComplexityMeasures.jl. Configuration options are always given as arguments to the chosen outcome space and probabilities estimator.

Return a tuple where the first element is a `Probabilities`

instance, which is vector-like and contains the probabilities, and where the second element `Ω`

are the outcomes corresponding to the probabilities, such that `p[i]`

is the probability for the outcome `Ω[i]`

.

The outcomes are actually included in `p`

, and you can use the `outcomes`

function on the `p`

to get them. `probabilities_and_outcomes`

returns both for backwards compatibility.

```
probabilities_and_outcomes(
[est::ProbabilitiesEstimator], counts::Counts
) → (p::Probabilities, Ω)
```

Estimate probabilities from the pre-computed `counts`

using the given `ProbabilitiesEstimator`

`est`

.

**Description**

Probabilities are computed by:

- Discretizing/encoding
`x`

into a finite set of outcomes`Ω`

specified by the provided`OutcomeSpace`

`o`

. - Assigning to each outcome
`Ωᵢ ∈ Ω`

either a count (how often it appears among the discretized data points), or a pseudo-count (some pre-normalized probability such that`sum(Ωᵢ for Ωᵢ in Ω) == 1`

).

For outcome spaces that result in pseudo counts, such as `PowerSpectrum`

, these pseudo counts are simply treated as probabilities and returned directly (that is, `est`

is ignored). For counting-based outcome spaces (see `OutcomeSpace`

docstring), probabilities are estimated from the counts using some `ProbabilitiesEstimator`

(first signature).

**Observed vs all probabilities**

Due to performance optimizations, whether the returned probabilities contain `0`

s as entries or not depends on the outcome space. E.g., in `ValueBinning`

`0`

s are skipped, while in `PowerSpectrum`

`0`

are not skipped, because we get them for free.

Use `allprobabilities_and_outcomes`

to guarantee that zero probabilities are also returned (may be slower).

`ComplexityMeasures.relevant_fieldnames`

— Method`relevant_fieldnames(x::T) → names::Vector{Symbol}`

Internal method that returns the relevant field names to be printed for type `T`

. For example, for `Encodings`

, the implementation is simply

`relevant_fieldnames(e::Encoding) = fieldnames(typeof(e))`

Individual types can override this method if special printing is desired.

`ComplexityMeasures.sample_entropy_probs`

— Method`sample_entropy_probs(x; k::Int = 2, m::Int = 2, τ::Int = 1, r = 0.2 * Statistics.std(x))`

Compute the probabilities required for `entropy_sample`

. `k`

is the embedding dimension, `τ`

is the embedding lag, and `m`

is a normalization constant (so that we consider the same number of points for both the `m`

-dimensional and the `m+1`

-dimensional embeddings), and `r`

is the radius.

`ComplexityMeasures.special_typeparameter_info`

— Method`special_typeparameter_info(::Type{T})`

Returns a string containing any extra type parameter information to be printed for a given type.

Defaults to nothing, but for types like `OrdinalPatterns`

, we'd like to include the display the type parameter m as `OrdinalPatterns{m}(...)`

.

`ComplexityMeasures.stencil_length`

— Function```
stencil_length(stencil::Vector{CartesianIndex{D}}) where D → length(stencil)
stencil_length(stencil::NTuple{2, NTuple{D, T}}) where {D, T} → prod(stencil[1])
stencil_length(stencil::Array{Int, D}) where D → sum(stencil)
```

Count the number of elements in the `stencil`

.

`ComplexityMeasures.total_outcomes`

— Method`total_outcomes(o::OutcomeSpace, x)`

Return the length (cardinality) of the outcome space $\Omega$ of `est`

.

For some `OutcomeSpace`

, the cardinality is known without knowledge of input `x`

, in which case the function dispatches to `total_outcomes(est)`

. In general it is recommended to use the 2-argument version irrespectively of estimator.

`ComplexityMeasures.transfermatrix`

— Method`transfermatrix(iv::InvariantMeasure) → (M::AbstractArray{<:Real, 2}, bins::Vector{<:SVector})`

Return the transfer matrix/operator and corresponding bins. Here, `bins[i]`

corresponds to the i-th row/column of the transfer matrix. Thus, the entry `M[i, j]`

is the probability of jumping from the state defined by `bins[i]`

to the state defined by `bins[j]`

.

See also: `TransferOperator`

.

`ComplexityMeasures.transferoperator`

— Method```
transferoperator(pts::AbstractStateSpaceSet,
binning::RectangularBinning) → TransferOperatorApproximationRectangular
```

Estimate the transfer operator given a set of sequentially ordered points subject to a rectangular partition given by the `binning`

.

`ComplexityMeasures.type_field_printcolor`

— Method`type_field_printcolor(x) → color::Symbol`

The color in which to print the field names for a type of type `typeof(x)`

.

Used in combination with `type_printcolor`

to make nested printing look better.

`ComplexityMeasures.type_printcolor`

— Method`type_printcolor(x) → color::Symbol`

The color in which to print the type name for type `typeof(x)`

.

Can be used to distinguish different types, so that nested printing looks better (e.g. `Encoding`

s inside `OutcomeSpace`

s).

`ComplexityMeasures.volume_minimal_rect`

— Method`volume_minimal_rect(xᵢ, nns) → vol`

Compute the volume of the minimal enclosing rectangle with `xᵢ`

at its center and containing all points `nᵢ ∈ nns`

either within the rectangle or on one of its borders.

This function respects the coordinate system of the input data, i.e. it does not perform any rotation (which would be computationally more demanding because we'd need to find the convex hull of `nns`

, but could potentially give more accurate results).

`ComplexityMeasures.volume_minimal_rect`

— Method`volume_minimal_rect(dists) → vol`

Compute the volume of a (hyper)-rectangle where the distance from its centre along the `k`

-th dimension is given by `dists[k]`

, and `length(dists)`

is the total dimension.

`ComplexityMeasures.GroupSlices.firstinds`

— Method```
firstinds(ic::Vector{Int})
firstinds(ib::Vector{Vector{Int}})
```

Returns a vector of integers containing the first index position of each unique value in the input integer vector `ic`

, or the first index position of each entry in the input vector of integer vectors `ib`

. When operating on the output returned from `unique(A, dim)`

, the returned vector of integers correspond to the positions of the first of each unique slice present in the original input multidimensional array `A`

along dimension `dim`

. The implementation of `firstinds`

accepting a vector of integers operates on the output returned from `groupslices(A, dim)`

. The implementation of `firstinds`

accepting a vector of vector of integers operates on the output returned from `groupinds(ic::Vector{Int})`

.

`ComplexityMeasures.GroupSlices.groupinds`

— Method`groupinds(ic)`

Returns a vector of vectors of integers wherein the vector of group slice index integers as returned from `groupslices(A, dim)`

is converted into a grouped vector of vectors. Each vector entry in the returned vector of vectors contains all of the positional indices of slices in the original input array `A`

that correspond to the unique slices along dimension `dim`

that are present in the array `C`

as returned from `unique(A, dim)`

.

`ComplexityMeasures.GroupSlices.groupslices`

— Method`groupslices(V::AbstractVector)`

Returns a vector of integers the same length as `V`

, which in place of each entry `x`

has the index of the first entry of `V`

which is equal to `x`

. This is true:

`all(x == V[i] for (x,i) in zip(V, groupslices(V)))`

`ComplexityMeasures.GroupSlices.groupslices`

— Method`groupslices(A; dims) = groupslices(A, dims)`

Returns a vector of integers where each integer element of the returned vector is a group number corresponding to the unique slices along dimension `dims`

as returned from `unique(A; dims=d)`

, where `A`

can be a multidimensional array. The default is `dims = 1`

. Example usage: If `C = unique(A; dims=dims)`

, `ic = groupslices(A, dims)`

, and `ndims(A) == ndims(C) == 3`

, then:

```
if dims == 1
all(A .== C[ic,:,:])
elseif dims == 2
all(A .== C[:,ic,:])
elseif dims == 3
all(A .== C[:,:,ic])
end
```

`ComplexityMeasures.GroupSlices.lastinds`

— Method`lastinds(ic::Vector{Int})`

Returns a vector of integers containing the last index position of each unique value in the input integer vector `ic`

. When operating on the output returned from `groupinds(unique(A, dim))`

, the returned vector of integers correspond to the positions of the last of each unique slice present in the original input multidimensional array `A`

along dimension `dim`

. The implementation of `firstinds`

accepting a vector of vector of integers operates on the output returned from `groupinds(ic::Vector{Int})`

.