# DataTypes

`DiscreteEntropy.EntropyData`

— Type```
abstract type EntropyData
Histogram <: EntropyData
Samples <: EntropyData
```

It is very easy, when confronted with a vector such as $[1,2,3,4,5,4]$ to forget whether it represents samples from a distribution, or a histogram of a (discrete) distribution. *DiscreteEntropy.jl* attempts to make this a difficult mistake to make by enforcing a type difference between a vector of samples and a vector of counts.

`DiscreteEntropy.AbstractCounts`

— TypeAbstractCounts{T<:Real,V<:AbstractVector{T}} <: AbstractVector{T}

Enforced type incompatibility between vectors of samples, vectors of counts, and vectors of xi.

**CountVector**

A vector representing a histogram

**SampleVector**

A vector of samples

**XiVector**

A vector of xi values for use with the `schurmann_generalised`

estimator.

`DiscreteEntropy.CountData`

— Type`CountData`

**Fields**

- multiplicities::Matrix{Float64} : multiplicity representation of data
- N::Float64 : total number of samples
- K::Int64 : observed support size

**Multiplicities**

All of the estimators operate over a multiplicity representation of raw data. Raw data takes the form either of a vector of samples, or a vector of counts (ie a histogram).

Given histogram `= [1,2,3,2,1,4]`

, the multiplicity representation is

\[\begin{pmatrix} 4 & 2 & 3 & 1 \\ 1 & 2 & 1 & 2 \end{pmatrix}\]

The top row represents bin contents, and the bottom row the number of bins. We have 1 bin with a 4 elements, 2 bins with 2 elements, 1 bin with 3 elements and 2 bins with only 1 element.

The advantages of the multiplicity representation are compactness and efficiency. Instead of calculating the surprisal of a bin of 2 twice, we can calculate it once and multiply by the multiplicity. The downside of the representation may be floating point creep due to multiplication.

**Constructor**

CountData is not expected to be called directly, nor is it advised to directly manipulate the fields. Use either `from_data`

, `from_counts`

or `from_samples`

instead.

`DiscreteEntropy.from_counts`

— Function```
from_counts(counts::AbstractVector; remove_zeros::Bool=true)
from_counts(counts::CountVector, remove_zeros::Bool)
```

Return a `CountData`

object from a vector or CountVector. Many estimators cannot handle a histogram with a 0 value bin, so there are filtered out unless remove_zeros is set to false.

`DiscreteEntropy.from_data`

— Function`from_data(data::AbstractVector, ::Type{T}; remove_zeros=true) where {T<:EntropyData}`

Create a CountData object from a vector or matrix. The function is parameterised on whether the vector contains samples or the histogram.

While *remove_zeros* defaults to *true*, this might not be the desired behaviour for Samples. A 0 value in the histgram causes problems for the estimators, but a 0 value in a vector of samples may be perfectly legitimate.

`DiscreteEntropy.from_samples`

— Function` from_samples(sample::SampleVector, remove_zeros::Bool)`

Return a `CountData`

object from a vector of samples.

## Vector Types

`DiscreteEntropy.cvector`

— Function```
cvector(vs::AbstractVector{<:Integer})
cvector(vs::AbstractVector{<:Real}) = CountVector(vs)
cvector(vs::AbstractArray{<:Real}) = CountVector(vec(vs))
```

Convert an AbstractVector into a CountVector. A CountVector represents the frequency of sampled values.

`DiscreteEntropy.svector`

— Function```
svector(vs::AbstractVector{<:Integer})
svector(vs::AbstractVector{<:Real})
svector(vs::AbstractArray{<:Real})
```

Convert an AbstractVector into a SampleVector. A SampleVector represents a sequence of sampled values.

`DiscreteEntropy.xivector`

— Function```
xivector(vs::AbstractVector{<:Real})
xivector(vs::AbstractArray{<:Real})
```

Convert an AbstractVector{Real} into a XiVector. Exclusively for use with `schurmann_generalised`

.