# Estimators

We split the estimators into two broad categories, which we call Frequentist and Bayesian. We also have a few composite estimators that either take an averaging or resampling approach to estimation.

## Frequentist Estimators

DiscreteEntropy.maximum_likelihoodFunction
maximum_likelihood(data::CountData)::Float64

Compute the maximum likelihood estimation of Shannon entropy of data in nats.

$$$\hat{H}_{\tiny{ML}} = - \sum_{i=1}^K p_i \log(p_i)$$$

or equivalently

$$$\hat{H}_{\tiny{ML}} = \log(N) - \frac{1}{N} \sum_{i=1}^{K}h_i \log(h_i)$$$
DiscreteEntropy.jackknife_mleFunction
jackknife_mle(data::CountData; corrected=false)::Tuple{AbstractFloat, AbstractFloat}

Compute the jackknifed maximum_likelihood estimate of data and the variance of the jackknifing (not the variance of the estimator itself).

If corrected is true, then the variance is scaled with n-1, else it is scaled with n

Estimation of the size of a closed population when capture probabilities vary among animals

DiscreteEntropy.miller_madowFunction
miller_madow(data::CountData)

Compute the Miller Madow estimation of Shannon entropy, with a positive bias based on the total number of samples seen (N) and the support size (K).

$$$\hat{H}_{\tiny{MM}} = \hat{H}_{\tiny{ML}} + \frac{K - 1}{2N}$$$
DiscreteEntropy.schurmannFunction
schurmann(data::CountData, ξ::Float64 = ℯ^(-1/2))

Compute the Schurmann estimate of Shannon entropy of data in nats.

$$$\hat{H}_{SHU} = \psi(N) - \frac{1}{N} \sum_{i=1}^{K} \, h_i \left( \psi(h_i) + (-1)^{h_i} ∫_0^{\frac{1}{\xi} - 1} \frac{t^{h_i}-1}{1+t}dt \right)$$$

This is no one ideal value for $\xi$, however the paper suggests $e^{(-1/2)} \approx 0.6$

schurmann

DiscreteEntropy.schurmann_generalisedFunction
schurmann_generalised(data::CountVector, xis::XiVector{T}) where {T<:Real}

schurmann_generalised

$$$\hat{H}_{\tiny{SHU}} = \psi(N) - \frac{1}{N} \sum_{i=1}^{K} \, h_i \left( \psi(h_i) + (-1)^{h_i} ∫_0^{\frac{1}{\xi_i} - 1} \frac{t^{h_i}-1}{1+t}dt \right)$$$

Compute the generalised Schurmann entropy estimation, given a countvector data and a xivector xis, which must both be the same length.

schurmann_generalised(data::CountVector, xis::Distribution, scalar=false)

Computes the generalised Schurmann entropy estimation, given a countvector data and a vector of xi values.

DiscreteEntropy.chao_shenFunction
chao_shen(data::CountData)

Compute the Chao-Shen estimate of the Shannon entropy of data in nats.

$$$\hat{H}_{CS} = - \sum_{i=i}^{K} \frac{\hat{p}_i^{CS} \log \hat{p}_i^{CS}}{1 - (1 - \hat{p}_i^{CS})}$$$

where

$$$\hat{p}_i^{CS} = (1 - \frac{1 - \hat{p}_i^{ML}}{N}) \hat{p}_i^{ML}$$$
DiscreteEntropy.zhangFunction
zhang(data::CountData)

Compute the Zhang estimate of the Shannon entropy of data in nats.

The recommended definition of Zhang's estimator is from Grabchak et al.

$$$\hat{H}_Z = \sum_{i=1}^K \hat{p}_i \sum_{v=1}^{N - h_i} \frac{1}{v} ∏_{j=0}^{v-1} \left( 1 + \frac{1 - h_i}{N - 1 - j} \right)$$$

The actual algorithm comes from Fast Calculation of entropy with Zhang's estimator by Lozano et al..

Entropy estimation in turing's perspective

DiscreteEntropy.bonachelaFunction
bonachela(data::CountData)

Compute the Bonachela estimator of the Shannon entropy of data in nats.

$$$\hat{H}_{B} = \frac{1}{N+2} \sum_{i=1}^{K} \left( (h_i + 1) \sum_{j=n_i + 2}^{N+2} \frac{1}{j} \right)$$$

Entropy estimates of small data sets

DiscreteEntropy.shrinkFunction
shrink(data::CountData)

Compute the Shrinkage, or James-Stein estimator of Shannon entropy for data in nats.

$$$\hat{H}_{\tiny{SHR}} = - \sum_{i=1}^{K} \hat{p}_x^{\tiny{SHR}} \log(\hat{p}_x^{\tiny{SHR}})$$$

where

$$$\hat{p}_x^{\tiny{SHR}} = \lambda t_x + (1 - \lambda) \hat{p}_x^{\tiny{ML}}$$$

and

$$$\lambda = \frac{ 1 - \sum_{x=1}^{K} (\hat{p}_x^{\tiny{SHR}})^2}{(n-1) \sum_{x=1}^K (t_x - \hat{p}_x^{\tiny{ML}})^2}$$$

with

$$$t_x = 1 / K$$$

Notes

Based on the implementation in the R package entropy

Entropy Inference and the James-Stein Estimator

DiscreteEntropy.chao_wang_jostFunction
chao_wang_jost(data::CountData)

Compute the Chao Wang Jost Shannon entropy estimate of data in nats.

$$$\hat{H}_{\tiny{CWJ}} = \sum_{1 \leq h_i \leq N-1} \frac{h_i}{N} \left(\sum_{k=h_i}^{N-1} \frac{1}{k} \right) + \frac{f_1}{N} (1 - A)^{-N + 1} \left\{ - \log(A) - \sum_{r=1}^{N-1} \frac{1}{r} (1 - A)^r \right\}$$$

with

$$$A = \begin{cases} \frac{2 f_2}{(N-1) f_1 + 2 f_2} \, & \text{if} \, f_2 > 0 \\ \frac{2}{(N-1)(f_1 - 1) + 1} \, & \text{if} \, f_2 = 0, \; f_1 \neq 0 \\ 1, & \text{if} \, f_1 = f_2 = 0 \end{cases}$$$

where $f_1$ is the number of singletons and $f_2$ the number of doubletons in data.

Notes

The algorithm is slightly modified port of that used in the entropart R library.

Entropy and the species accumulation curve: a novel entropy estimator via discovery rates of new species

## Bayesian Estimators

DiscreteEntropy.bayesFunction
bayes(data::CountData, α::AbstractFloat; K=nothing)

Compute an estimate of Shannon entropy given data and a concentration parameter $α$. If K is not provided, then the observed support size in data is used.

$$$\hat{H}_{\text{Bayes}} = - \sum_{k=1}^{K} \hat{p}_k^{\text{Bayes}} \; \log \hat{p}_k^{\text{Bayes}}$$$

where

$$$p_k^{\text{Bayes}} = \frac{K + α}{n + A}$$$

and

$$$A = \sum_{x=1}^{K} α_{x}$$$

In addition to setting your own α, we have the following suggested choices

1. jeffrey : α = 0.5
2. laplace: α = 1.0
3. schurmann_grassberger: α = 1 / K
4. minimax: α = √{n} / K
DiscreteEntropy.nsbFunction
nsb(data, K=data.K)

Returns the Bayesian estimate of Shannon entropy of data, using the Nemenman, Shafee, Bialek algorithm

$$$\hat{H}^{\text{NSB}} = \frac{ \int_0^{\ln(K)} d\xi \, \rho(\xi, \textbf{n}) \langle H^m \rangle_{\beta (\xi)} } { \int_0^{\ln(K)} d\xi \, \rho(\xi\mid n)}$$$

where

$$$\rho(\xi \mid \textbf{n}) = \mathcal{P}(\beta (\xi)) \frac{ \Gamma(\kappa(\xi))}{\Gamma(N + \kappa(\xi))} \prod_{i=1}^K \frac{\Gamma(n_i + \beta(\xi))}{\Gamma(\beta(\xi))}$$$
DiscreteEntropy.ansbFunction
ansb(data::CountData; undersampled::Float64=0.1)::Float64

Return the Asymptotic NSB estimation of the Shannon entropy of data in nats.

See Asymptotic NSB estimator (equations 11 and 12)

$$$\hat{H}_{\tiny{ANSB}} = (C_\gamma - \log(2)) + 2 \log(N) - \psi(\Delta)$$$

where $C_\gamma$ is Euler's Gamma ($\approx 0.57721...$), $\psi_0$ is the digamma function and $\Delta$ the number of coincidences in the data.

This is designed for the extremely undersampled regime (K ~ N) and diverges with N when well-sampled. ANSB requires that $N/K → 0$, which we set to be $N/K < 0.1$ by default

Asymptotic NSB estimator (equations 11 and 12)

## Mixed Estimators

DiscreteEntropy.pertFunction
pert(data::CountData, estimator::Type{T}) where {T<:AbstractEstimator}
pert(data::CountData, e1::Type{T}, e2::Type{T}) where {T<:AbstractEstimator}

A Pert estimate of entropy, where

a = best estimate
b = most likely estimate
c = worst case estimate
H = \frac{a + 4b + c}{6}

where the default estimators are: a = maximum_likelihood, c = ANSB and $b$ is the most likely value = ChaoShen

DiscreteEntropy.jackknifeFunction
 jackknife(data::CountData, estimator::Type{T}; corrected=false) where {T<:AbstractEstimator}

Compute the jackknifed estimate of estimator on data.

DiscreteEntropy.bayesian_bootstrapFunction
 bayesian_bootstrap(samples::SampleVector, estimator::Type{T}, reps, seed, concentration) where {T<:AbstractEstimator}

Compute a bayesian bootstrap resampling of samples for estimation with estimator, where reps is number of resampling to perform, seed is the random seed and concentration is the concentration parameter for a Dirichlet distribution.

Note