Entropies.jl

This package provides probability and entropy estimators used for entropy computations in the CausalityTools.jl and DynamicalSystems.jl packages.

Most of the code in this package assumes that your data is represented by the Dataset-type from DelayEmbeddings.jl, where each observation is a D-dimensional data point represented by a static vector. See the DynamicalSystems.jl documentation for more info. Univariate timeseries given as AbstractVector{<:Real} also work with some estimators, but are treated differently based on which method for probability/entropy estimation is applied.

API

The main API of this package is contained in two functions:

These functions dispatch on subtypes of ProbabilitiesEstimator, which are:

using Entropies, InteractiveUtils
subtypes(ProbabilitiesEstimator)
6-element Vector{Any}:
 CountOccurrences
 Entropies.BinningProbabilitiesEstimator
 Entropies.CountingBasedProbabilityEstimator
 Entropies.WaveletProbabilitiesEstimator
 NaiveKernel
 SymbolicProbabilityEstimator

Probabilities

Entropies.ProbabilitiesType
Probabilities(x) → p

A simple wrapper type around an x::AbstractVector which ensures that p sums to 1. Behaves identically to Vector.

Entropies.probabilitiesFunction
probabilities(x::Vector_or_Dataset, est::ProbabilitiesEstimator) → p::Probabilities

Calculate probabilities representing x based on the provided estimator and return them as a Probabilities container (Vector-like). The probabilities are typically unordered and may or may not contain 0s, see the documentation of the individual estimators for more.

The configuration options are always given as arguments to the chosen estimator.

probabilities(x::Vector_or_Dataset, ε::AbstractFloat) → p::Probabilities

Convenience syntax which provides probabilities for x based on rectangular binning (i.e. performing a histogram). In short, the state space is divided into boxes of length ε, and formally we use est = VisitationFrequency(RectangularBinning(ε)) as an estimator, see VisitationFrequency.

This method has a linearithmic time complexity (n log(n) for n = length(x)) and a linear space complexity (l for l = dimension(x)). This allows computation of probabilities (histograms) of high-dimensional datasets and with small box sizes ε without memory overflow and with maximum performance. To obtain the bin information along with p, use binhist.

probabilities(x::Vector_or_Dataset, n::Integer) → p::Probabilities

Same as the above method, but now each dimension of the data is binned into n::Int equal sized bins instead of bins of length ε::AbstractFloat.

probabilities(x::Vector_or_Dataset) → p::Probabilities

Directly count probabilities from the elements of x without any discretization, binning, or other processing (mostly useful when x contains categorical or integer data).

Entropies.probabilities!Function
probabilities!(args...)

Identical to probabilities(args...), but allows pre-allocation of temporarily used containers.

Only works for certain estimators. See for example SymbolicPermutation.

Generalized entropy

Entropies.genentropyFunction
genentropy(p::Probabilities; q = 1.0, base = MathConstants.e)

Compute the generalized order-q entropy of some probabilities returned by the probabilities function. Alternatively, compute entropy from pre-computed Probabilities.

genentropy(x::Vector_or_Dataset, est; q = 1.0, base)

A convenience syntax, which calls first probabilities(x, est) and then calculates the entropy of the result (and thus est can be a ProbabilitiesEstimator or simply ε::Real).

Description

Let $p$ be an array of probabilities (summing to 1). Then the generalized (Rényi) entropy is

\[H_q(p) = \frac{1}{1-q} \log \left(\sum_i p[i]^q\right)\]

and generalizes other known entropies, like e.g. the information entropy ($q = 1$, see [Shannon1948]), the maximum entropy ($q=0$, also known as Hartley entropy), or the correlation entropy ($q = 2$, also known as collision entropy).

Fast histograms

Entropies.binhistFunction
binhist(x::AbstractDataset, ε::Real) → p, bins
binhist(x::AbstractDataset, ε::RectangularBinning) → p, bins

Hyper-optimized histogram calculation for x with rectangular binning ε. Returns the probabilities p of each bin of the histogram as well as the bins. Notice that bins are the starting corners of each bin. If ε isa Real, then the actual bin size is ε across each dimension. If ε isa RectangularBinning, then the bin size for each dimension will depend on the binning scheme.

See also: RectangularBinning.

  • Rényi1960A. Rényi, Proceedings of the fourth Berkeley Symposium on Mathematics, Statistics and Probability, pp 547 (1960)
  • Shannon1948C. E. Shannon, Bell Systems Technical Journal 27, pp 379 (1948)