# Mutual Information and Conditional Information

DiscreteEntropy.mutual_informationFunction
 mutual_information(X::CountData, Y::CountData, XY::CountData, estimator::Type{T}) where {T<:AbstractEstimator}
mutual_information(joint::Matrix{I}, estimator::Type{T}) where {T<:AbstractEstimator, I<:Real}
$$$I(X;Y) = \sum_{y \in Y} \sum_{x \in X} p(x, y) \log \left(\frac{p_{X,Y}(x,y)}{p_X(x) p_Y(y)}\right)$$$

But we use the identity

$$$I(X;Y) = H(X) + H(Y) - H(X,Y)$$$

where $H(X,Y)$ is the entropy of the joint distribution

DiscreteEntropy.conditional_entropyFunction
conditional_entropy(X::CountData, XY::CountData, estimator::Type{T}) where {T<:NonParameterisedEstimator}
conditional_entropy(joint::Matrix{R}, estimator::Type{NSB}; dim=1, guess=false, KJ=nothing, KX=nothing) where {R<:Real}
conditional_entropy(joint::Matrix{R}, estimator::Type{Bayes}, α; dim=1, KJ=nothing, KX=nothing) where {R<:Real}

Compute the conditional entropy of Y conditioned on X

$$$H(Y \mid X) = - \sum_{x \in X, y \in Y} p(x, y) \ln \frac{p(x, y)}{p(x)}$$$

Compute the estimated conditional entropy of Y given X, from counts of X, and (X,Y) and estimator estimator

$$$\hat{H}(Y \mid X) = \hat{H}(X, Y) - \hat{H}(X)$$$