Docstrings · CategoricalDistributions.jl

CategoricalDistributions.UnivariateFinite — Type

UnivariateFinite(support,
                 probs;
                 pool=nothing,
                 augmented=false,
                 ordered=false)

Construct a discrete univariate distribution whose finite support is the elements of the vector support, and whose corresponding probabilities are elements of the vector probs. Alternatively, construct an abstract array of UnivariateFinite distributions by choosing probs to be an array of one higher dimension than the array generated.

Here the word "probabilities" is an abuse of terminology as there is no requirement that the that probabilities actually sum to one. The only requirement is that the probabilities have a common type T for which zero(T) is defined. In particular, UnivariateFinite objects implement arbitrary non-negative, signed, or complex measures over finite sets of labelled points. A UnivariateDistribution will be a bona fide probability measure when constructed using the augment=true option (see below) or when fit to data. And the probabilities of a UnivariateFinite object d must be non-negative, with a non-zero sum, for rand(d) to be defined and interpretable.

Unless pool is specified, support should have type AbstractVector{<:CategoricalValue} and all elements are assumed to share the same categorical pool, which may be larger than support.

Important. All levels of the common pool have associated probabilities, not just those in the specified support. However, these probabilities are always zero (see example below).

If probs is a matrix, it should have a column for each class in support (or one less, if augment=true). More generally, probs will be an array whose size is of the form (n1, n2, ..., nk, c), where c = length(support) (or one less, if augment=true) and the constructor then returns an array of UnivariateFinite distributions of size (n1, n2, ..., nk).

using CategoricalDistributions, CategoricalArrays, Distributions
samples = categorical(['x', 'x', 'y', 'x', 'z'])
julia> Distributions.fit(UnivariateFinite, samples)
           UnivariateFinite{Multiclass{3}}
     ┌                                        ┐
   x ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.6
   y ┤■■■■■■■■■■■■ 0.2
   z ┤■■■■■■■■■■■■ 0.2
     └                                        ┘

julia> d = UnivariateFinite([samples[1], samples[end]], [0.1, 0.9])
UnivariateFinite{Multiclass{3}(x=>0.1, z=>0.9)
           UnivariateFinite{Multiclass{3}}
     ┌                                        ┐
   x ┤■■■■ 0.1
   z ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.9
     └                                        ┘

julia> rand(d, 3)
3-element Array{Any,1}:
 CategoricalValue{Symbol,UInt32} 'z'
 CategoricalValue{Symbol,UInt32} 'z'
 CategoricalValue{Symbol,UInt32} 'z'

julia> levels(samples)
3-element Array{Symbol,1}:
 'x'
 'y'
 'z'

julia> pdf(d, 'y')
0.0

Specifying a pool

Alternatively, support may be a list of raw (non-categorical) elements if pool is:

some CategoricalArray, CategoricalValue or CategoricalPool, such that support is a subset of levels(pool)
missing, in which case a new categorical pool is created which has support as its only levels.

In the last case, specify ordered=true if the pool is to be considered ordered.

julia> UnivariateFinite(['x', 'z'], [0.1, 0.9], pool=missing, ordered=true)
         UnivariateFinite{OrderedFactor{2}}
     ┌                                        ┐
   x ┤■■■■ 0.1
   z ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.9
     └                                        ┘

samples = categorical(['x', 'x', 'y', 'x', 'z'])
julia> d = UnivariateFinite(['x', 'z'], [0.1, 0.9], pool=samples)
     ┌                                        ┐
   x ┤■■■■ 0.1
   z ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.9
     └                                        ┘

julia> pdf(d, 'y') # allowed as `'y' in levels(samples)`
0.0

v = categorical(['x', 'x', 'y', 'x', 'z', 'w'])
probs = rand(100, 3)
probs = probs ./ sum(probs, dims=2)
julia> d1 = UnivariateFinite(['x', 'y', 'z'], probs, pool=v)
100-element UnivariateFiniteVector{Multiclass{4},Symbol,UInt32,Float64}:
 UnivariateFinite{Multiclass{4}}(x=>0.194, y=>0.3, z=>0.505)
 UnivariateFinite{Multiclass{4}}(x=>0.727, y=>0.234, z=>0.0391)
 UnivariateFinite{Multiclass{4}}(x=>0.674, y=>0.00535, z=>0.321)
   ⋮
 UnivariateFinite{Multiclass{4}}(x=>0.292, y=>0.339, z=>0.369)

Probability augmentation

If augment=true the provided array is augmented by inserting appropriate elements ahead of those provided, along the last dimension of the array. This means the user only provides probabilities for the classes c2, c3, ..., cn. The class c1 probabilities are chosen so that each UnivariateFinite distribution in the returned array is a bona fide probability distribution.

julia> UnivariateFinite([0.1, 0.2, 0.3], augment=true, pool=missing)
3-element UnivariateFiniteArray{Multiclass{2}, String, UInt8, Float64, 1}:
 UnivariateFinite{Multiclass{2}}(class_1=>0.9, class_2=>0.1)
 UnivariateFinite{Multiclass{2}}(class_1=>0.8, class_2=>0.2)
 UnivariateFinite{Multiclass{2}}(class_1=>0.7, class_2=>0.3)

d2 = UnivariateFinite(['x', 'y', 'z'], probs[:, 2:end], augment=true, pool=v)
julia> pdf(d1, levels(v)) ≈ pdf(d2, levels(v))
true

UnivariateFinite(prob_given_class; pool=nothing, ordered=false)

Construct a discrete univariate distribution whose finite support is the set of keys of the provided dictionary, prob_given_class, and whose values specify the corresponding probabilities.

The type requirements on the keys of the dictionary are the same as the elements of support given above with this exception: if non-categorical elements (raw labels) are used as keys, then pool=... must be specified and cannot be missing.

If the values (probabilities) are arrays instead of scalars, then an abstract array of UnivariateFinite elements is created, with the same size as the array.

CategoricalDistributions.UnivariateFiniteArray — Type

UnivariateFiniteArray

Array type whose elements are UnivariateFinite distributions sharing a common sample space (CategoricalArrays pool).

See UnivariateFinite for constructor.

Base.isapprox — Method

isapprox(d1::UnivariateFinite, d2::UnivariateFinite; kwargs...)

Returns true if and only if d1 and d2 have the same support and the corresponding probabilities are approximately equal. The key-word arguments kwargs are passed through to each call of isapprox on probability pairs. Returns false otherwise.

CategoricalDistributions._cumulative — Method

_cumulative(d::UnivariateFinite)

Return the cumulative probability vector C for the distribution d, using only classes in the support of d, ordered according to the categorical elements used at instantiation of d. Used only to implement random sampling from d. We have C[1] == 0 and C[end] == 1, assuming the probabilities have been normalized.

CategoricalDistributions._rand — Method

rand(rng, pcumulative, R)

Randomly sample the distribution with discrete support R(1):R(n) which has cumulative probability vector p_cumulative (see _cummulative).

CategoricalDistributions.classes — Method

classes(x)

Return, as a CategoricalVector, all the categorical elements with the same pool as CategoricalValue x (including x), with an ordering consistent with the pool. Note that x in classes(x) is always true.

Not to be confused with levels(x.pool). See the example below.

Also, overloaded for x a CategoricalArray, CategoricalPool, and for views of CategoricalArray.

**Private method.*

julia>  v = categorical([:c, :b, :c, :a])
4-element CategoricalArrays.CategoricalArray{Symbol,1,UInt32}:
 :c
 :b
 :c
 :a

julia> levels(v)
3-element Array{Symbol,1}:
 :a
 :b
 :c

julia> x = v[4]
CategoricalArrays.CategoricalValue{Symbol,UInt32} :a

julia> classes(x)
3-element CategoricalArrays.CategoricalArray{Symbol,1,UInt32}:
 :a
 :b
 :c

julia> levels(x.pool)
3-element Array{Symbol,1}:
 :a
 :b
 :c

CategoricalDistributions.classes — Method

classes(d::UnivariateFinite)
classes(d::UnivariateFiniteArray)

A list of categorial elements in the common pool of classes used to construct d.

v = categorical(["yes", "maybe", "no", "yes"])
d = UnivariateFinite(v[1:2], [0.3, 0.7])
classes(d) # CategoricalArray{String,1,UInt32}["maybe", "no", "yes"]

CategoricalDistributions.decoder — Method

d = decoder(x)

A callable object for decoding the integer representation of a CategoricalValue sharing the same pool as the CategoricalValue x. Specifically, one has d(int(y)) == y for all y in the same pool as x. One can also call d on integer arrays, in which case d is broadcast over all elements.

julia> v = categorical([:c, :b, :c, :a])
julia> int(v)
4-element Array{UInt32,1}:
 0x00000003
 0x00000002
 0x00000003
 0x00000001
julia> d = decoder(v[3])
julia> d(int(v)) == v
true

Warning: There is no guarantee that int(d(u)) == u will always holds.