DelayEmbeddings.DelayEmbeddingsModule

DelayEmbeddings.jl

docsdev docsstable CI codecov Package Downloads

A Julia package that provides a generic interface for performing delay coordinate embeddings, as well as cutting edge algorithms for creating optimal embeddings given some data. It can be used as a standalone package, or as part of DynamicalSystems.jl.

To install it, run import Pkg; Pkg.add("DelayEmbeddings").

All further information is provided in the documentation, which you can either find online or build locally by running the docs/make.jl file.

DelayEmbeddings.DelayEmbeddingType
DelayEmbedding(γ, τ, h = nothing) → `embedding`

Return a delay coordinates embedding structure to be used as a function-like-object, given a timeseries and some index. Calling

embedding(s, n)

will create the n-th delay vector of the embedded space, which has γ temporal neighbors with delay(s) τ. γ is the embedding dimension minus 1, τ is the delay time(s) while h are extra weights, as in embed for more.

Be very careful when choosing n, because @inbounds is used internally. Use τrange!

DelayEmbeddings.GeneralizedEmbeddingType
GeneralizedEmbedding(τs, js = ones(length(τs)), ws = nothing) -> `embedding`

Return a delay coordinates embedding structure to be used as a function. Given a timeseries or trajectory (i.e. StateSpaceSet) s and calling

embedding(s, n)

will create the delay vector of the n-th point of s in the embedded space using generalized embedding (see genembed).

js is ignored for timeseries input s (since all entries of js must be 1 in this case) and in addition js defaults to (1, ..., 1) for all τ.

Be very careful when choosing n, because @inbounds is used internally. Use τrange!

DelayEmbeddings._bisectMethod
_bisect(s)

Create a partition histogram of the sorted series s with a partition of its space defined by a recursive bisection method. The first level partition divides s in two segments with equal number of points; each partition is divided into two further sub-pantitions, etc., until the distribution of the points in the highest level subpartition is homogeneous.

DelayEmbeddings._equalbinsMethod
_equalbins(s[; nbins, binwidth])

Create a histogram of the sorted series s with bins of the same width. Either the number of bins (nbins) or their width (binwidth) must be given as keyword argument (but not both).

DelayEmbeddings._frequencies!Method
_frequencies!(f, s, edges)

Calculate a histogram of values of s along the bins defined by edges. Both s and edges must be sorted ascendingly. The frequencies (counts) of s in each bin will be stored in the pre-allocated vector f.

DelayEmbeddings._selfmutualinfo!Method
_selfmutualinfo!(f, sτ, edges, bins0)

Calculate the mutual information between the distribution of the delayed time series and its original image.

The two series are partitioned in a joint histogram with axes divided by the points given in edges; the distribution of the original image is given by bins0. The values of must be arranged such that all the points of the bin (1,j) are contained in the first bins0[1] positions, the points of the bin (2,j) are contained in the followingbins[2]` positions, etc.

The vector f is used as a placeholder to pre-allocate the histogram.

DelayEmbeddings.all_neighborsMethod
all_neighbors(A::StateSpaceSet, stype, w = 0) → idxs, dists
all_neighbors(vtree, vs, ns, K, w)

Return the maximum(K)-th nearest neighbors for all input points vs, with indices ns in original data, while respecting the theiler window w.

This function is nothing more than a convinience call to Neighborhood.bulksearch.

It is an internal, convenience function.

DelayEmbeddings.beta_statisticMethod
beta_statistic(Y::StateSpaceSet, s::Vector) [, τs, w]) → β

Compute the β-statistic β for input state space trajectory Y and a timeseries s according to Nichkawde [Nichkawde2013], based on estimating derivatives on a projected manifold. For a range of delay values τs, β gets computed and its maximum over all considered τs serves as the optimal delay considered in this embedding cycle.

Arguments τs, w as in mdop_embedding.

Description

The β-statistic is based on the geometrical idea of maximal unfolding of the reconstructed attractor and is tightly related to the False Nearest Neighbor method ([Kennel1992]). In fact the method eliminates the maximum amount of false nearest neighbors in each embedding cycle. The idea is to estimate the absolute value of the directional derivative with respect to a possible new dimension in the reconstruction process, and with respect to the nearest neighbor, for all points of the state space trajectory:

ϕ'(τ) = Δϕd(τ) / Δxd

Δxd is simply the Euclidean nearest neighbor distance for a reference point with respect to the given Theiler window w. Δϕd(τ) is the distance of the reference point to its nearest neighbor in the one dimensional time series s, for the specific τ. Δϕ_d(τ) = |s(i+τ)-s(j+τ)|, with i being the index of the considered reference point and j the index of its nearest neighbor.

Finally,

β = log β(τ) = ⟨log₁₀ ϕ'(τ)⟩ ,

with ⟨.⟩ being the mean over all reference points. When one chooses the maximum of β over all considered τ's, one obtains the optimal delay value for this embedding cycle. Note that in the first embedding cycle, the input state space trajectory Y can also be just a univariate time series.

DelayEmbeddings.choose_optimal_tau1Method

This function chooses the optimal τ value as the maximum of the β-statistics maxima. It returns the index of this maximum as well as the time series number corresponding to that maximum of maxima. In this first embedding cycle it also returns the time series number, which act as Y_act.

DelayEmbeddings.choose_optimal_tau1_garciaMethod

This function chooses the optimal τ value as the 1st minimum of all the N-statistics 1st minima. It returns the index of this 1st minimum as well as the time series number corresponding to that 1st minimum of all 1st minima. In this first embedding cycle it also returns the time series number, which act as Y_act.

DelayEmbeddings.choose_optimal_tau2Method

This function chooses the optimal τ value as the maximum of the β-statistics maxima. It returns the index of this maximum as well as the time series number corresponding to that maximum of maxima.

DelayEmbeddings.choose_optimal_tau2_garciaMethod

This function chooses the optimal τ value as the 1st minimum of the 1st minima of the N-statistics. It returns the index of this 1st minimum as well as the time series number corresponding to that 1st minimum of the 1st minima.

DelayEmbeddings.choose_right_embedding_paramsMethod
Choose the right embedding parameters of the ε★-statistic in the first

embedding cycle. Return the L-decrease-value, the corresponding index value of the chosen peak τ_idx and the number of the chosen time series to start with idx.

DelayEmbeddings.comp_Ek2!Method
comp_Ek2!(ϵ_ball,u_k,Y,v,NNidxs,T,K,metric) → E²(T)

Returns the approximated conditional variance for a specific point in state space ns (index value) with its K-nearest neighbors, which indices are stored in NNidxs, for a time horizon T. This corresponds to Eqs. 13 & 14 in [Uzal2011]. The norm specified in input parameter metric is used for distance computations.

DelayEmbeddings.comp_Ek2!Method
comp_Ek2!(ϵ_ball,u_k,Y,v,NNidxs,T,K,metric) → E²(T)

Returns the approximated conditional variance for a specific point in state space ns (index value) with its K-nearest neighbors, which indices are stored in NNidxs, for a time horizon T. This corresponds to Eqs. 13 & 14 in [Uzal2011]. The norm specified in input parameter metric is used for distance computations.

DelayEmbeddings.delay_afnnMethod
delay_afnn(s::AbstractVector, τ:Int, ds = 2:6; metric=Euclidean(), w = 0) → E₁

Compute the parameter E₁ of Cao's "averaged false nearest neighbors" method for determining the minimum embedding dimension of the time series s, with a sequence of τ-delayed temporal neighbors.

Description

Given the scalar timeseries s and the embedding delay τ compute the values of E₁ for each embedding dimension d ∈ ds, according to Cao's Method (eq. 3 of[Cao1997]).

This quantity is a ratio of the averaged distances between the nearest neighbors of the reconstructed time series, which quantifies the increment of those distances when the embedding dimension changes from d to d+1.

Return the vector of all computed E₁s. To estimate a good value for d from this, find d for which the value E₁ saturates at some value around 1.

Note: This method does not work for datasets with perfectly periodic signals.

w is the Theiler window.

See also: optimal_separated_de and stochastic_indicator.

DelayEmbeddings.delay_f1nnFunction
delay_f1nn(s::AbstractVector, τ::Int, ds = 2:6; metric = Euclidean())

Calculate the ratio of "false first nearest neighbors" (FFNN) of the datasets created from s with embed(s, d, τ) for d ∈ ds.

Description

Given a dataset made by embed(s, d, τ) the "false first nearest neighbors" (FFNN) are the pairs of points that are nearest to each other at dimension d that cease to be nearest neighbors at dimension d+1.

The returned value is a vector with the ratio between the number of FFNN and the number of points in the dataset for each d ∈ ds. The optimal value for d is found at the point where this ratio approaches zero.

See also: optimal_separated_de.

DelayEmbeddings.delay_fnnFunction
delay_fnn(s::AbstractVector, τ:Int, ds = 2:6; rtol=10.0, atol=2.0) → FNNs

Calculate the number of "false nearest neighbors" (FNNs) of the datasets created from s with embed(s, d, τ) for d ∈ ds.

Description

Given a dataset made by embed(s, d, τ) the "false nearest neighbors" (FNN) are the pairs of points that are nearest to each other at dimension d, but are separated at dimension d+1. Kennel's criteria for detecting FNN are based on a threshold for the relative increment of the distance between the nearest neighbors (rtol, eq. 4 in[Kennel1992]), and another threshold for the ratio between the increased distance and the "size of the attractor" (atol, eq. 5 in[Kennel1992]). These thresholds are given as keyword arguments.

The returned value is a vector with the number of FNN for each γ ∈ γs. The optimal value for γ is found at the point where the number of FNN approaches zero.

See also: optimal_separated_de.

DelayEmbeddings.delay_ifnnMethod
delay_ifnn(s::Vector, τ::Int, ds = 1:10; kwargs...) → `FNNs`

Compute and return the FNNs-statistic for the time series s and a uniform time delay τ and embedding dimensions ds after [Hegger1999]. In this notation γ ∈ γs = d-1, if d is the embedding dimension. This fraction tends to 0 when the optimal embedding dimension with an appropriate lag is reached.

Keywords

*r = 2: Obligatory threshold, which determines the maximum tolerable spreading of trajectories in the reconstruction space. *metric = Euclidean: The norm used for distance computations. *w = 1 = The Theiler window.

See also: optimal_separated_de.

DelayEmbeddings.embedMethod
embed(s, d, τ [, h])

Embed s using delay coordinates with embedding dimension d and delay time τ and return the result as a StateSpaceSet. Optionally use weight h, see below.

Here τ > 0, use genembed for a generalized version.

Description

If τ is an integer, then the $n$-th entry of the embedded space is

\[(s(n), s(n+\tau), s(n+2\tau), \dots, s(n+(d-1)\tau))\]

If instead τ is a vector of integers, so that length(τ) == d-1, then the $n$-th entry is

\[(s(n), s(n+\tau[1]), s(n+\tau[2]), \dots, s(n+\tau[d-1]))\]

The resulting set can have same invariant quantities (like e.g. Lyapunov exponents) with the original system that the timeseries were recorded from, for proper d and τ. This is known as the Takens embedding theorem [Takens1981] [Sauer1991]. The case of different delay times allows embedding systems with many time scales, see[Judd1998].

If provided, h can be weights to multiply the entries of the embedded space. If h isa Real then the embedding is

\[(s(n), h \cdot s(n+\tau), h^2 \cdot s(n+2\tau), \dots,h^{d-1} \cdot s(n+γ\tau))\]

Otherwise h can be a vector of length d-1, which the decides the weights of each entry directly.

References

[Takens1981] : F. Takens, Detecting Strange Attractors in Turbulence — Dynamical Systems and Turbulence, Lecture Notes in Mathematics 366, Springer (1981)

[Sauer1991] : T. Sauer et al., J. Stat. Phys. 65, pp 579 (1991)

DelayEmbeddings.embedding_cycle!Method

Perform an embedding cycle of the multivariate embedding, but the 1st. Return the actual reconstruction vector Y_act and all nearest neighbor distances NNdist_new of Y_act.

DelayEmbeddings.embedding_cycle_MDOP!Method

Computes the actual reconstruction vectors Y_act and the according nearest neighbour distances NNdist_new for every embedding cycle of the multivariate MDOP-embedding procedure, but the first one.

DelayEmbeddings.estimate_delayFunction
estimate_delay(s, method::String [, τs = 1:100]; kwargs...) -> τ

Estimate an optimal delay to be used in embed. The method can be one of the following:

  • "ac_zero" : first delay at which the auto-correlation function becomes <0.
  • "ac_min" : delay of first minimum of the auto-correlation function.
  • "mi_min" : delay of first minimum of mutual information of s with itself (shifted for various τs). Keywords nbins, binwidth are propagated into selfmutualinfo.
  • "exp_decay" : exponential_decay_fit of the correlation function rounded to an integer (uses least squares on c(t) = exp(-t/τ) to find τ).
  • "exp_extrema" : same as above but the exponential fit is done to the absolute value of the local extrema of the correlation function.

Both the mutual information and correlation function (autocor) are computed only for delays τs. This means that the min methods can never return the first value of τs!

The method mi_min is significantly more accurate than the others and also returns good results for most timeseries. It is however the slowest method (but still quite fast!).

DelayEmbeddings.exponential_decay_fitFunction
exponential_decay_fit(x, y, weight = :equal) -> τ

Perform a least square fit of the form y = exp(-x/τ) and return τ. Taken from: http://mathworld.wolfram.com/LeastSquaresFittingExponential.html. Assumes equal lengths of x, y and that y ≥ 0.

To use the method that gives more weight to small values of y, use weight = :small.

DelayEmbeddings.find_fnn_optimalMethod

Helper function for selecting the appropriate embedding dimension from the statistic when using Kennel's, Hegger's or Krakovskas's method.

DelayEmbeddings.findlocalextremaMethod
findlocalextrema(y) -> max_ind, min_ind

Find the local extrema of given array y, by scanning point-by-point. Return the indices of the maxima (max_ind) and the indices of the minima (min_ind).

DelayEmbeddings.findlocalminimaMethod
findlocalminima(s)

Return the indices of the local minima the timeseries s. If none exist, return the index of the minimum (as a vector). Starts of plateaus are also considered local minima.

DelayEmbeddings.first_embedding_cycle!Method

Perform the first embedding cycle of the multivariate embedding. Return the actual reconstruction vector Y_act and all nearest neighbor distances NNdist_new of Y_act.

DelayEmbeddings.first_embedding_cycle_MDOP!Method

Computes the actual reconstruction vectors Y_act and the according nearest neighbour distances NNdist_new for the very first embedding cycle of the multivariate MDOP-embedding procedure.

DelayEmbeddings.fnn_embedding_cycleFunction
fnn_embedding_cycle(NNdist, NNdistnew, r=2) -> FNNs

Compute the amount of false nearest neighbors FNNs, when adding another component to a given (vector-) time series. This new component is the τ-lagged version of a univariate time series. NNdist is storing the distances of the nearest neighbor for all considered fiducial points and NNdistnew is storing the distances of the nearest neighbor for each fiducial point in one embedding dimension higher using a given τ. The obligatory threshold r is by default set to 2.

DelayEmbeddings.garcia_almeida_embeddingMethod
garcia_almeida_embedding(s; kwargs...) → Y, τ_vals, ts_vals, FNNs ,NS

A unified approach to properly embed a time series (Vector type) or a set of time series (StateSpaceSet type) based on the papers of Garcia & Almeida [Garcia2005a],[Garcia2005b].

Keyword arguments

  • τs= 0:50: Possible delay values τs (in sampling time units). For each of the τs's the N-statistic gets computed.
  • w::Int = 1: Theiler window (neighbors in time with index w close to the point, that are excluded from being true neighbors). w=0 means to exclude only the point itself, and no temporal neighbors.
  • r1 = 10: The threshold, which defines the factor of tolerable stretching for the d_E1-statistic.
  • r2 = 2: The threshold for the tolerable relative increase of the distance between the nearest neighbors, when increasing the embedding dimension.
  • fnn_thres= 0.05: A threshold value defining a sufficiently small fraction of false nearest neighbors, in order to the let algorithm terminate and stop the embedding procedure (`0 ≤ fnn_thres < 1).
  • T::Int = 1: The forward time step (in sampling units) in order to compute the d_E2-statistic (see algorithm description). Note that in the paper this is not a free parameter and always set to T=1.
  • metric = Euclidean(): metric used for finding nearest neigbhors in the input phase space trajectory Y.
  • max_num_of_cycles = 50: The algorithm will stop after that many cycles no matter what.

Description

The method works iteratively and gradually builds the final embedding vectors Y. Based on the N-statistic the algorithm picks an optimal delay value τ for each embedding cycle as the first local minimum of N. In case of multivariate embedding, i.e. when embedding a set of time series (s::StateSpaceSet), the optimal delay value τ is chosen as the first minimum from all minimum's of all considered N-statistics for each embedding cycle. The range of considered delay values is determined in τs and for the nearest neighbor search we respect the Theiler window w. After each embedding cycle the FNN-statistic FNNs [Hegger1999][Kennel1992] is being checked and as soon as this statistic drops below the threshold fnn_thres, the algorithm breaks. In order to increase the practability of the method the algorithm also breaks, when the FNN-statistic FNNs increases . The final embedding vector is stored in Y (StateSpaceSet). The chosen delay values for each embedding cycle are stored in the τ_vals and the according time series number chosen for the according delay value in τ_vals is stored in ts_vals. For univariate embedding (s::Vector) ts_vals is a vector of ones of length τ_vals, because there is simply just one time series to choose from. The function also returns the N-statistic NS for each embedding cycle as an Array of Vectors.

Notice that we were not able to reproduce the figures from the papers with our implementation (which nevertheless we believe is the correct one).

DelayEmbeddings.genembedFunction
genembed(s, τs, js = ones(...); ws = nothing) → ssset

Create a generalized embedding of s which can be a timeseries or arbitrary StateSpaceSet, and return the result as a new StateSpaceSet.

The generalized embedding works as follows:

  • τs denotes what delay times will be used for each of the entries of the delay vector. It is recommended that τs[1] = 0. τs is allowed to have negative entries as well.
  • js denotes which of the timeseries contained in s will be used for the entries of the delay vector. js can contain duplicate indices.
  • ws are optional weights that weight each embedded entry (the i-th entry of the delay vector is weighted by ws[i]). If provided, it is recommended that ws[1] == 1.

τs, js, ws are tuples (or vectors) of length D, which also coincides with the embedding dimension. For example, imagine input trajectory $s = [x, y, z]$ where $x, y, z$ are timeseries (the columns of the StateSpaceSet). If js = (1, 3, 2) and τs = (0, 2, -7) the created delay vector at each step $n$ will be

\[(x(n), z(n+2), y(n-7))\]

Using ws = (1, 0.5, 0.25) as well would create

\[(x(n), \frac{1}{2} z(n+2), \frac{1}{4} y(n-7))\]

js can be skipped, defaulting to index 1 (first timeseries) for all delay entries, while it has no effect if s is a timeseries instead of a StateSpaceSet.

See also embed. Internally uses GeneralizedEmbedding.

DelayEmbeddings.get_binomial_tableMethod
get_binomial_table(p, α; trial_range::Int=8) -> `δ_to_ε_amount`, Dict(δ_points => ϵ_points)

compute the numbers of points from the δ-neighborhood, which need to fall outside the ϵ-neighborhood, in order to reject the Null Hypothesis at a significance level α. One parameter of the binomial distribution is p, the other one would be the number of trials, i.e. the considered number of points of the δ-neighborhood. trial_range determines the number of considered δ-neighborhood-points, always starting from 8. For instance, if trial_range=8 (Default), then δ-neighborhood sizes from 8 up to 15 are considered. Return δ_to_ε_amount, a dictionary with δ_points as keys and the corresponding number of points in order to reject the Null, ϵ_points, constitute the values.

DelayEmbeddings.hcat_lagged_valuesMethod
hcat_lagged_values(Y, s::Vector, τ::Int) -> Z

Add the τ lagged values of the timeseries s as additional component to Y (Vector or StateSpaceSet), in order to form a higher embedded dataset Z. The dimensionality of Z is thus equal to that of Y + 1.

DelayEmbeddings.local_L_statisticsMethod
Return the maximum decrease of the L-statistic `L_decrease` and corresponding

delay-indices max_idx for all local maxima in ε★

DelayEmbeddings.mdop_embeddingMethod
mdop_embedding(s::Vector; kwargs...) → Y, τ_vals, ts_vals, FNNs, βS

MDOP (for "maximizing derivatives on projection") is a unified approach to properly embed a timeseries or a set of timeseries (StateSpaceSet) based on the paper of Chetan Nichkawde [Nichkawde2013].

Keyword arguments

  • τs= 0:50: Possible delay values τs. For each of the τs's the β-statistic gets computed.
  • w::Int = 1: Theiler window (neighbors in time with index w close to the point, that are excluded from being true neighbors). w=0 means to exclude only the point itself, and no temporal neighbors.
  • fnn_thres::Real= 0.05: A threshold value defining a sufficiently small fraction of false nearest neighbors, in order to the let algorithm terminate and stop the embedding procedure (`0 ≤ fnn_thres < 1).
  • r::Real = 2: The threshold for the tolerable relative increase of the distance between the nearest neighbors, when increasing the embedding dimension.
  • max_num_of_cycles = 50: The algorithm will stop after that many cycles no matter what.

Description

The method works iteratively and gradually builds the final embedding Y. Based on the beta_statistic the algorithm picks an optimal delay value τ for each embedding cycle as the global maximum of β. In case of multivariate embedding, i.e. when embedding a set of time series (s::StateSpaceSet), the optimal delay value τ is chosen as the maximum from all maxima's of all considered β-statistics for each possible timeseries. The range of considered delay values is determined in τs and for the nearest neighbor search we respect the Theiler window w.

After each embedding cycle the FNN-statistic FNNs [Hegger1999][Kennel1992] is being checked and as soon as this statistic drops below the threshold fnn_thres, the algorithm terminates. In order to increase the practability of the method the algorithm also terminates when the FNN-statistic FNNs increases.

The final embedding is returned as Y. The chosen delay values for each embedding cycle are stored in the τ_vals and the according timeseries index chosen for the the respective according delay value in τ_vals is stored in ts_vals. βS, FNNs are returned for clarity and double-checking, since they are computed anyway. In case of multivariate embedding, βS will store all β-statistics for all available time series in each embedding cycle. To double-check the actual used β-statistics in an embedding cycle 'k', simply βS[k][:,ts_vals[k+1]].

DelayEmbeddings.mdop_maximum_delayMethod
mdop_maximum_delay(s, tw = 1:50, samplesize = 1.0)) -> τ_max, L

Compute an upper bound for the search of optimal delays, when using mdop_embedding mdop_embedding or beta_statistic beta_statistic.

Description

The input time series s gets embedded with unit lag and increasing dimension, for dimensions (or time windows) tw (RangeObject). For each of such a time window the L-statistic from Uzal et al. [Uzal2011] will be computed. samplesize determines the fraction of points to be considered in the computation of L (see uzal_cost). When this statistic reaches its global minimum the maximum delay value τ_max gets returned. When s is a multivariate StateSpaceSet, τ_max will becomputed for all timeseries of that StateSpaceSet and the maximum value will be returned. The returned L-statistic has size (length(tw), size(s,2)).

DelayEmbeddings.n_statisticMethod
n_statistic(Y, s; kwargs...) → N, d_E1

Perform one embedding cycle according to the method proposed in [Garcia2005a] for a given phase space trajectory Y (of type StateSpaceSet) and a time series s (of typeVector). Return the proposed N-StatisticNand all nearest neighbor distancesd_E1for each point of the input phase space trajectoryY. Note thatY` is a single time series in case of the first embedding cycle.

Keyword arguments

  • τs= 0:50: Considered delay values τs (in sampling time units). For each of the τs's the N-statistic gets computed.
  • r = 10: The threshold, which defines the factor of tolerable stretching for the d_E1-statistic (see algorithm description).
  • T::Int = 1: The forward time step (in sampling units) in order to compute the d_E2-statistic (see algorithm description). Note that in the paper this is not a free parameter and always set to T=1.
  • w::Int = 0: Theiler window (neighbors in time with index w close to the point, that are excluded from being true neighbors). w=0 means to exclude only the point itself, and no temporal neighbors. Note that in the paper this is not a free parameter and always w=0.
  • metric = Euclidean(): metric used for finding nearest neigbhors in the input phase space trajectory Y.

Description

For a range of possible delay values τs one constructs a temporary embedding matrix. That is, one concatenates the input phase space trajectory Y with the τ-lagged input time series s. For each point on the temporary trajectory one computes its nearest neighbor, which is denoted as the d_E1-statistic for a specific τ. Now one considers the distance between the reference point and its nearest neighbor T sampling units ahead and calls this statistic d_E2. [Garcia2005a] strictly use T=1, so they forward each reference point and its corresponding nearest neighbor just by one (!) sampling unit. Here it is a free parameter.

The N-statistic is then the fraction of d_E2/d_E1-pairs which exceed a threshold r.

Plotted vs. the considered τs-values it is proposed to pick the τ-value for this embedding cycle as the value, where N has its first local minimum.

DelayEmbeddings.optimal_separated_deFunction
optimal_separated_de(s, method = "afnn", dmethod = "mi_min"; kwargs...) → 𝒟, τ, E

Produce an optimal delay embedding 𝒟 of the given timeseries s by using the separated approach of first finding an optimal (and constant) delay time using estimate_delay with the given dmethod, and then an optimal embedding dimension, by calculating an appropriate statistic for each dimension d ∈ 1:dmax. Return the embedding 𝒟, the optimal delay time τ (the optimal embedding dimension d is just size(𝒟, 2)) and the actual statistic E used to estimate optimal d.

Notice that E is a function of the embedding dimension, which ranges from 1 to dmax.

For calculating E to estimate the dimension we use the given method which can be:

  • "afnn" (default) is Cao's "Averaged False Nearest Neighbors" method[Cao1997], which gives a ratio of distances between nearest neighbors.
  • "ifnn" is the "Improved False Nearest Neighbors" from Hegger & Kantz[Hegger1999], which gives the fraction of false nearest neighbors.
  • "fnn" is Kennel's "False Nearest Neighbors" method[Kennel1992], which gives the number of points that cease to be "nearest neighbors" when the dimension increases.
  • "f1nn" is Krakovská's "False First Nearest Neighbors" method[Krakovská2015], which gives the ratio of pairs of points that cease to be "nearest neighbors" when the dimension increases.

For more details, see individual methods: delay_afnn, delay_ifnn, delay_fnn, delay_f1nn.

Careful in automated methods

While this method is automated if you want to be really sure of the results, you should directly calculate the statistic and plot its values versus the dimensions.

Keyword arguments

The keywords

τs = 1:100, dmax = 10

denote which delay times and embedding dimensions ds ∈ 1:dmax to consider when calculating optimal embedding. The keywords

slope_thres = 0.05, stoch_thres = 0.1, fnn_thres = 0.05

are specific to this function, see Description below. All remaining keywords are propagated to the low level functions:

w, rtol, atol, τs, metric, r

Description

We estimate the optimal embedding dimension based on the given delay time gained from dmethod as follows: For Cao's method the optimal dimension is reached, when the slope of the E₁-statistic (output from "afnn") falls below the threshold slope_thres and the according stochastic test turns out to be false, i.e. if the E₂-statistic's first value is < 1 - stoch_thres.

For all the other methods we return the optimal embedding dimension when the corresponding FNN-statistic (output from "ifnn", "fnn" or "f1nn") falls below the fnn-threshold fnn_thres AND the slope of the statistic falls below the threshold slope_thres. Note that with noise contaminated time series, one might need to adjust fnn_thres according to the noise level.

See also the file test/compare_different_dimension_estimations.jl for a comparison.

DelayEmbeddings.pecoraMethod
pecora(s, τs, js; kwargs...) → ⟨ε★⟩, ⟨Γ⟩

Compute the (average) continuity statistic ⟨ε★⟩ and undersampling statistic ⟨Γ⟩ according to Pecora et al.[Pecoral2007] (A unified approach to attractor reconstruction), for a given input s (timeseries or StateSpaceSet) and input generalized embedding defined by (τs, js), according to genembed. The continuity statistic represents functional independence between the components of the existing embedding and one additional timeseries. The returned results are matrices with size TxJ.

Keyword arguments

  • delays = 0:50: Possible time delay values delays (in sampling time units). For each of the τ's in delays the continuity-statistic ⟨ε★⟩ gets computed. If undersampling = true (see further down), also the undersampling statistic ⟨Γ⟩ gets returned for all considered delay values.
  • J = 1:dimension(s): calculate for all timeseries indices in J. If input s is a timeseries, this is always just 1.
  • samplesize::Real = 0.1: determine the fraction of all phase space points (=length(s)) to be considered (fiducial points v) to average ε★ to produce ⟨ε★⟩, ⟨Γ⟩
  • K::Int = 13: the amount of nearest neighbors in the δ-ball (read algorithm description). Must be at least 8 (in order to gurantee a valid statistic). ⟨ε★⟩ is computed taking the minimum result over all k ∈ K.
  • metric = Chebyshev(): metrix with which to find nearest neigbhors in the input embedding (ℝᵈ space, d = length(τs)).
  • w = 1: Theiler window (neighbors in time with index w close to the point, that are excluded from being true neighbors). w=0 means to exclude only the point itself, and no temporal neighbors.
  • undersampling = false : whether to calculate the undersampling statistic or not (if not, zeros are returned for ⟨Γ⟩). Calculating ⟨Γ⟩ is thousands of times slower than ⟨ε★⟩.
  • db::Int = 100: Amount of bins used into calculating the histograms of each timeseries (for the undersampling statistic).
  • α::Real = 0.05: The significance level for obtaining the continuity statistic
  • p::Real = 0.5: The p-parameter for the binomial distribution used for the computation of the continuity statistic.

Description

Notice that the full algorithm is too large to discuss here, and is written in detail (several pages!) in the source code of pecora.

DelayEmbeddings.pecuzal_embeddingMethod
pecuzal_embedding(s; kwargs...) → Y, τ_vals, ts_vals, ΔLs, ⟨ε★⟩

A unified approach to properly embed a timeseries or a set of timeseries (StateSpaceSet) based on the recent PECUZAL algorithm due to Kraemer et al.[Kraemer2021].

For more details, see the description below.

Keyword arguments

  • τs = 0:50: Possible delay values τs (in sampling time units). For each of the τs's the continuity statistic ⟨ε★⟩ gets computed and further processed in order to find optimal delays τᵢ for each embedding cycle i.
  • w::Int = 0: Theiler window (neighbors in time with index w close to the point, that are excluded from being true neighbors). w=0 means to exclude only the point itself, and no temporal neighbors.
  • samplesize::Real = 1.0: Fraction of state space points to be considered (fiducial points v) to average ε★ over, in order to produce ⟨ε★⟩. Lower fraction value decreases accuracy as well as computation time.
  • K::Int = 13: the amount of nearest neighbors in the δ-ball (read algorithm description). Must be at least 8 (in order to gurantee a valid statistic). ⟨ε★⟩ is computed taking the minimum result over all k ∈ K.
  • KNN::Int = 3: the amount of nearest neighbors considered, in order to compute σk^2 (read algorithm description [`uzalcost]@ref). If given a vector, the minimum result over allknn ∈ KNN` is returned.
  • L_threshold::Real = 0: The algorithm breaks, when this threshold is exceeded by ΔL in an embedding cycle (set as a positive number, i.e. an absolute value of ΔL).
  • α::Real = 0.05: The significance level for obtaining the continuity statistic
  • p::Real = 0.5: The p-parameter for the binomial distribution used for the computation of the continuity statistic ⟨ε★⟩.
  • max_cycles = 50: The algorithm will stop after that many cycles no matter what.
  • econ::Bool = false: Economy-mode for L-statistic computation. Instead of computing L-statistics for time horizons 2:Tw, here we only compute them for 2:2:Tw, see description for further details.
  • verbose = true: Print information about the process.

Description

The method works iteratively and gradually builds the final embedding vectors Y. Based on the ⟨ε★⟩-statistic (of pecora) the algorithm picks an optimal delay value τᵢ for each embedding cycle i. For achieving that, we take the inpute time series s, denoted as the actual phase space trajectory Y_actual and compute the continuity statistic ⟨ε★⟩.

  1. Each local maxima in ⟨ε★⟩ is used for constructing a candidate embedding trajectory Y_trial with a delay corresponding to that specific peak in ⟨ε★⟩.
  2. We then compute the L-statistic (of uzal_cost) for Y_trial (L-trial) and Y_actual (L_actual) for increasing prediction time horizons (free parameter in the L-statistic) and save the maximum difference max(L-trial - L_actual) as ΔL (Note that this is a negative number, since the L-statistic decreases with better reconstructions).
  3. We pick the τ-value, for which ΔL is minimal (=maximum decrease of the overall L-value) and construct the actual embedding trajectory Y_actual (steps 1.-3. correspond to an embedding cycle).
  4. We repeat steps 1.-3. with Y_actual as input and stop the algorithm when ΔL is > 0, i.e. when and additional embedding component would not lead to a lower overall L-value. Y_actual -> Y.

In case of multivariate embedding, i.e. when embedding a set of M time series (s::StateSpaceSet), in each embedding cycle the continuity statistic ⟨ε★⟩ gets computed for all M time series available. The optimal delay value τ in each embedding cycle is chosen as the peak/τ-value for which ΔL is minimal under all available peaks and under all M ⟨ε★⟩'s. In the first embedding cycle there will be M² different ⟨ε★⟩'s to consider, since it is not clear a priori which time series of the input should consitute the first component of the embedding vector and form Y_actual.

The range of considered delay values is determined in τs and for the nearest neighbor search we respect the Theiler window w. The final embedding vector is stored in Y (StateSpaceSet). The chosen delay values for each embedding cycle are stored in τ_vals and the according time series numbers chosen for each delay value in τ_vals are stored in ts_vals. For univariate embedding (s::Vector) ts_vals is a vector of ones of length τ_vals, because there is simply just one timeseries to choose from. The function also returns the ΔLs-values for each embedding cycle and the continuity statistic ⟨ε★⟩ as an Array of Vectors.

For distance computations the Euclidean norm is used.

DelayEmbeddings.selfmutualinfoMethod
selfmutualinfo(s, τs; kwargs...) → m

Calculate the mutual information between the time series s and itself delayed by τ points for ττs, using an improvement of the method outlined by Fraser & Swinney in[Fraser1986].

Description

The joint space of s and its τ-delayed image () is partitioned as a rectangular grid, and the mutual information is computed from the joint and marginal frequencies of s and in the grid as defined in [1]. The mutual information values are returned in a vector m of the same length as τs.

If any of the optional keyword parameters is given, the grid will be a homogeneous partition of the space where s and are defined. The margins of that partition will be divided in a number of bins equal to nbins, such that the width of each bin will be binwidth, and the range of nonzero values of s will be in the centre. If only of those two parameters is given, the other will be automatically calculated to adjust the size of the grid to the area where s and are nonzero.

If no parameter is given, the space will be partitioned by a recursive bisection algorithm based on the method given in [1].

Notice that the recursive method of [1] evaluates the joint frequencies of s and in each cell resulting from a partition, and stops when the data points are uniformly distributed across the sub-partitions of the following levels. For performance and stability reasons, the automatic partition method implemented in this function is only used to divide the axes of the grid, using the marginal frequencies of s.

DelayEmbeddings.stochastic_indicatorMethod
stochastic_indicator(s::AbstractVector, τ:Int, ds = 2:5) -> E₂s

Compute an estimator for apparent randomness in a delay embedding with ds dimensions.

Description

Given the scalar timeseries s and the embedding delay τ compute the values of E₂ for each d ∈ ds, according to Cao's Method (eq. 5 of [Cao1997]).

Use this function to confirm that the input signal is not random and validate the results of delay_afnn. In the case of random signals, it should be E₂ ≈ 1 ∀ d.

DelayEmbeddings.uzal_costMethod
uzal_cost(Y::StateSpaceSet; kwargs...) → L

Compute the L-statistic L for input dataset Y according to Uzal et al.[Uzal2011], based on theoretical arguments on noise amplification, the complexity of the reconstructed attractor and a direct measure of local stretch which constitutes an irrelevance measure. It serves as a cost function of a state space trajectory/embedding and therefore allows to estimate a "goodness of a embedding" and also to choose proper embedding parameters, while minimizing L over the parameter space. For receiving the local cost function L_local (for each point in state space - not averaged), use uzal_cost_local(...).

Keyword arguments

  • samplesize = 0.5: Number of considered fiducial points v as a fraction of input state space trajectory Y's length, in order to average the conditional variances and neighborhood sizes (read algorithm description) to produce L.
  • K = 3: the amount of nearest neighbors considered, in order to compute σ_k^2 (read algorithm description). If given a vector, minimum result over all k ∈ K is returned.
  • metric = Euclidean(): metric used for finding nearest neigbhors in the input state space trajectory `Y.
  • w = 1: Theiler window (neighbors in time with index w close to the point, that are excluded from being true neighbors). w=0 means to exclude only the point itself, and no temporal neighbors.
  • Tw = 40: The time horizon (in sampling units) up to which E_k^2 gets computed and averaged over (read algorithm description).

Description

The L-statistic is based on theoretical arguments on noise amplification, the complexity of the reconstructed attractor and a direct measure of local stretch which constitutes an irrelevance measure. Technically, it is the logarithm of the product of σ-statistic and a normalization statistic α:

L = log10(σ*α)

The σ-statistic is computed as follows. σ = √σ² = √(E²/ϵ²). approximates the conditional variance at each point in state space and for a time horizon T ∈ Tw, using K nearest neighbors. For each reference point of the state space trajectory, the neighborhood consists of the reference point itself and its K+1 nearest neighbors. measures how strong a neighborhood expands during T time steps. is averaged over many time horizons T = 1:Tw. Consequently, ϵ² is the size of the neighborhood at the reference point itself and is defined as the mean pairwise distance of the neighborhood. Finally, σ² gets averaged over a range of reference points on the attractor, which is controlled by samplesize. This is just for performance reasons and the most accurate result will obviously be gained when setting samplesize=1.0

The α-statistic is a normalization factor, such that σ's from different embeddings can be compared. α² is defined as the inverse of the sum of the inverse of all ϵ²'s for all considered reference points.

DelayEmbeddings.uzal_cost_localMethod
uzal_cost_local(Y::StateSpaceSet; kwargs...) → L_local

Compute the local L-statistic L_local for input dataset Y according to Uzal et al.[Uzal2011]. The length of L_local is length(Y)-Tw and denotes a value of the local cost-function to each of the points of the state space trajectory.

In contrast to uzal_cost σ² here does not get averaged over all the state space reference points on the attractor. Therefore, the mean of 'Llocal' is different to L, when calling `uzalcost`, since the averaging is performed before logarithmizing.

Keywords as in uzal_cost.

DelayEmbeddings.uzal_cost_pecuzalMethod
uzal_cost_pecuzal(Y1::StateSpaceSet, Y2::StateSpaceSet, Tw; kwargs...) → L_decrease

This function is based on the functionality of uzal_cost, here specifically tailored for the needs in the PECUZAL algorithm. Compute the L-statistics L1 and L2 for the input datasets Y1 and Y2 for increasing time horizons T = 1:Tw. For each T, compute L1 and L2 and decrease L_decrease = L2 - L1. If L_decrease is a negative value, then Y2 can be regarded as a "better" reconstruction that Y1. Break, when L_decrease reaches the 1st local minima, since this will typically also be the global minimum. Return the according minimum L_decrease-value.

Keyword arguments

  • K = 3: the amount of nearest neighbors considered, in order to compute σ_k^2 (read algorithm description). If given a vector, minimum result over all k ∈ K is returned.
  • metric = Euclidean(): metric used for finding nearest neigbhors in the input state space trajectory `Y.
  • w = 1: Theiler window (neighbors in time with index w close to the point, that are excluded from being true neighbors). w=0 means to exclude only the point itself, and no temporal neighbors.
  • econ::Bool = false: Economy-mode for L-statistic computation. Instead of computing L-statistics for time horizons 2:Tw, here we only compute them for 2:2:Tw.
  • samplesize::Real = 1: determine the fraction of all phase space points to be considered (fiducial points v) for computing the L-statistic
DelayEmbeddings.τrangeMethod
τrange(s, de::AbstractEmbedding)

Return the range r of valid indices n to create delay vectors out of s using de.