Visitation frequency (binning)

Entropies.VisitationFrequencyType
VisitationFrequency(r::RectangularBinning) <: BinningProbabilitiesEstimator

A probability estimator based on binning data into rectangular boxes dictated by the binning scheme r.

Example

# Construct boxes by dividing each coordinate axis into 5 equal-length chunks.
b = RectangularBinning(5)

# A probabilities estimator that, when applied a dataset, computes visitation frequencies
# over the boxes of the binning
est = VisitationFrequency(b)

See also: RectangularBinning.

Specifying binning/boxes

Entropies.RectangularBinningType
RectangularBinning(ϵ) <: RectangularBinningScheme

Instructions for creating a rectangular box partition using the binning scheme ϵ. Binning instructions are deduced from the type of ϵ.

Rectangular binnings may be automatically adjusted to the data in which the RectangularBinning is applied, as follows:

  1. ϵ::Int divides each coordinate axis into ϵ equal-length intervals, extending the upper bound 1/100th of a bin size to ensure all points are covered.

  2. ϵ::Float64 divides each coordinate axis into intervals of fixed size ϵ, starting from the axis minima until the data is completely covered by boxes.

  3. ϵ::Vector{Int} divides the i-th coordinate axis into ϵ[i] equal-length intervals, extending the upper bound 1/100th of a bin size to ensure all points are covered.

  4. ϵ::Vector{Float64} divides the i-th coordinate axis into intervals of fixed size ϵ[i], starting from the axis minima until the data is completely covered by boxes.

Rectangular binnings may also be specified on arbitrary min-max ranges.

  1. ϵ::Tuple{Vector{Tuple{Float64,Float64}},Int64} creates intervals along each coordinate axis from ranges indicated by a vector of (min, max) tuples, then divides each coordinate axis into an integer number of equal-length intervals. Note: this does not ensure that all points are covered by the data (points outside the binning are ignored).

Example 1: Grid deduced automatically from data (partition guaranteed to cover data points)

Flexible box sizes

The following binning specification finds the minima/maxima along each coordinate axis, then split each of those data ranges (with some tiny padding on the edges) into 10 equal-length intervals. This gives (hyper-)rectangular boxes, and works for data of any dimension.

using Entropies
RectangularBinning(10)

Now, assume the data consists of 2-dimensional points, and that we want a finer grid along one of the dimensions than over the other dimension.

The following binning specification finds the minima/maxima along each coordinate axis, then splits the range along the first coordinate axis (with some tiny padding on the edges) into 10 equal-length intervals, and the range along the second coordinate axis (with some tiny padding on the edges) into 5 equal-length intervals. This gives (hyper-)rectangular boxes.

using Entropies
RectangularBinning([10, 5])

Fixed box sizes

The following binning specification finds the minima/maxima along each coordinate axis, then split the axis ranges into equal-length intervals of fixed size 0.5 until the all data points are covered by boxes. This approach yields (hyper-)cubic boxes, and works for data of any dimension.

using Entropies
RectangularBinning(0.5)

Again, assume the data consists of 2-dimensional points, and that we want a finer grid along one of the dimensions than over the other dimension.

The following binning specification finds the minima/maxima along each coordinate axis, then splits the range along the first coordinate axis into equal-length intervals of size 0.3, and the range along the second axis into equal-length intervals of size 0.1 (in both cases, making sure the data are completely covered by the boxes). This approach gives a (hyper-)rectangular boxes.

using Entropies
RectangularBinning([0.3, 0.1])

Example 2: Custom grids (partition not guaranteed to cover data points):

Assume the data consists of 3-dimensional points (x, y, z), and that we want a grid that is fixed over the intervals [x₁, x₂] for the first dimension, over [y₁, y₂] for the second dimension, and over [z₁, z₂] for the third dimension. We when want to split each of those ranges into 4 equal-length pieces. Beware: some points may fall outside the partition if the intervals are not chosen properly (these points are simply discarded).

The following binning specification produces the desired (hyper-)rectangular boxes.

using Entropies, DelayEmbeddings

D = Dataset(rand(100, 3));

x₁, x₂ = 0.5, 1 # not completely covering the data, which are on [0, 1]
y₁, y₂ = -2, 1.5 # covering the data, which are on [0, 1]
z₁, z₂ = 0, 0.5 # not completely covering the data, which are on [0, 1]

ϵ = [(x₁, x₂), (y₁, y₂), (z₁, z₂)], 4 # [interval 1, interval 2, ...], n_subdivisions

RectangularBinning(ϵ)