# Kernel density

`Entropies.NaiveKernel`

— Type`NaiveKernel(ϵ::Real, ss = KDTree; w = 0, metric = Euclidean()) <: ProbabilitiesEstimator`

Estimate probabilities/entropy using a "naive" kernel density estimation approach (KDE), as discussed in Prichard and Theiler (1995) ^{[PrichardTheiler1995]}.

Probabilities $P(\mathbf{x}, \epsilon)$ are assigned to every point $\mathbf{x}$ by counting how many other points occupy the space spanned by a hypersphere of radius `ϵ`

around $\mathbf{x}$, according to:

\[P_i( X, \epsilon) \approx \dfrac{1}{N} \sum_{s} B(||X_i - X_j|| < \epsilon),\]

where $B$ gives 1 if the argument is `true`

. Probabilities are then normalized.

The search structure `ss`

is any search structure supported by Neighborhood.jl. Specifically, use `KDTree`

to use a tree-based neighbor search, or `BruteForce`

for the direct distances between all points. KDTrees heavily outperform direct distances when the dimensionality of the data is much smaller than the data length.

The keyword `w`

stands for the Theiler window, and excludes indices $s$ that are within $|i - s| ≤ w$ from the given point $X_i$.

## Distance evaluation methods

Missing docstring for `TreeDistance`

. Check Documenter's build log for details.

Missing docstring for `DirectDistance`

. Check Documenter's build log for details.

## Example

Here, we draw some random points from a 2D normal distribution. Then, we use kernel density estimation to associate a probability to each point `p`

, measured by how many points are within radius `1.5`

of `p`

. Plotting the actual points, along with their associated probabilities estimated by the KDE procedure, we get the following surface plot.

```
using Distributions, PyPlot, DelayEmbeddings, Entropies
𝒩 = MvNormal([1, -4], 2)
N = 500
D = Dataset(sort([rand(𝒩) for i = 1:N]))
x, y = columns(D)
p = probabilities(D, NaiveKernel(1.5))
surf(x, y, p.p)
xlabel("x"); ylabel("y")
savefig("kernel_surface.png")
```

- PrichardTheiler1995Prichard, D., & Theiler, J. (1995). Generalized redundancies for time series analysis. Physica D: Nonlinear Phenomena, 84(3-4), 476-493.