CategoricalDistributions.UnivariateFinite
— TypeUnivariateFinite(support,
probs;
pool=nothing,
augmented=false,
ordered=false)
Construct a discrete univariate distribution whose finite support is the elements of the vector support
, and whose corresponding probabilities are elements of the vector probs
. Alternatively, construct an abstract array of UnivariateFinite
distributions by choosing probs
to be an array of one higher dimension than the array generated.
Here the word "probabilities" is an abuse of terminology as there is no requirement that the that probabilities actually sum to one. The only requirement is that the probabilities have a common type T
for which zero(T)
is defined. In particular, UnivariateFinite
objects implement arbitrary non-negative, signed, or complex measures over finite sets of labelled points. A UnivariateDistribution
will be a bona fide probability measure when constructed using the augment=true
option (see below) or when fit
to data. And the probabilities of a UnivariateFinite
object d
must be non-negative, with a non-zero sum, for rand(d)
to be defined and interpretable.
Unless pool
is specified, support
should have type AbstractVector{<:CategoricalValue}
and all elements are assumed to share the same categorical pool, which may be larger than support
.
Important. All levels of the common pool have associated probabilities, not just those in the specified support
. However, these probabilities are always zero (see example below).
If probs
is a matrix, it should have a column for each class in support
(or one less, if augment=true
). More generally, probs
will be an array whose size is of the form (n1, n2, ..., nk, c)
, where c = length(support)
(or one less, if augment=true
) and the constructor then returns an array of UnivariateFinite
distributions of size (n1, n2, ..., nk)
.
using CategoricalDistributions, CategoricalArrays, Distributions
samples = categorical(['x', 'x', 'y', 'x', 'z'])
julia> Distributions.fit(UnivariateFinite, samples)
UnivariateFinite{Multiclass{3}}
┌ ┐
x ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.6
y ┤■■■■■■■■■■■■ 0.2
z ┤■■■■■■■■■■■■ 0.2
└ ┘
julia> d = UnivariateFinite([samples[1], samples[end]], [0.1, 0.9])
UnivariateFinite{Multiclass{3}(x=>0.1, z=>0.9)
UnivariateFinite{Multiclass{3}}
┌ ┐
x ┤■■■■ 0.1
z ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.9
└ ┘
julia> rand(d, 3)
3-element Array{Any,1}:
CategoricalValue{Symbol,UInt32} 'z'
CategoricalValue{Symbol,UInt32} 'z'
CategoricalValue{Symbol,UInt32} 'z'
julia> levels(samples)
3-element Array{Symbol,1}:
'x'
'y'
'z'
julia> pdf(d, 'y')
0.0
Specifying a pool
Alternatively, support
may be a list of raw (non-categorical) elements if pool
is:
some
CategoricalArray
,CategoricalValue
orCategoricalPool
, such thatsupport
is a subset oflevels(pool)
missing
, in which case a new categorical pool is created which hassupport
as its only levels.
In the last case, specify ordered=true
if the pool is to be considered ordered.
julia> UnivariateFinite(['x', 'z'], [0.1, 0.9], pool=missing, ordered=true)
UnivariateFinite{OrderedFactor{2}}
┌ ┐
x ┤■■■■ 0.1
z ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.9
└ ┘
samples = categorical(['x', 'x', 'y', 'x', 'z'])
julia> d = UnivariateFinite(['x', 'z'], [0.1, 0.9], pool=samples)
┌ ┐
x ┤■■■■ 0.1
z ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.9
└ ┘
julia> pdf(d, 'y') # allowed as `'y' in levels(samples)`
0.0
v = categorical(['x', 'x', 'y', 'x', 'z', 'w'])
probs = rand(100, 3)
probs = probs ./ sum(probs, dims=2)
julia> d1 = UnivariateFinite(['x', 'y', 'z'], probs, pool=v)
100-element UnivariateFiniteVector{Multiclass{4},Symbol,UInt32,Float64}:
UnivariateFinite{Multiclass{4}}(x=>0.194, y=>0.3, z=>0.505)
UnivariateFinite{Multiclass{4}}(x=>0.727, y=>0.234, z=>0.0391)
UnivariateFinite{Multiclass{4}}(x=>0.674, y=>0.00535, z=>0.321)
⋮
UnivariateFinite{Multiclass{4}}(x=>0.292, y=>0.339, z=>0.369)
Probability augmentation
If augment=true
the provided array is augmented by inserting appropriate elements ahead of those provided, along the last dimension of the array. This means the user only provides probabilities for the classes c2, c3, ..., cn
. The class c1
probabilities are chosen so that each UnivariateFinite
distribution in the returned array is a bona fide probability distribution.
julia> UnivariateFinite([0.1, 0.2, 0.3], augment=true, pool=missing)
3-element UnivariateFiniteArray{Multiclass{2}, String, UInt8, Float64, 1}:
UnivariateFinite{Multiclass{2}}(class_1=>0.9, class_2=>0.1)
UnivariateFinite{Multiclass{2}}(class_1=>0.8, class_2=>0.2)
UnivariateFinite{Multiclass{2}}(class_1=>0.7, class_2=>0.3)
d2 = UnivariateFinite(['x', 'y', 'z'], probs[:, 2:end], augment=true, pool=v)
julia> pdf(d1, levels(v)) ≈ pdf(d2, levels(v))
true
UnivariateFinite(prob_given_class; pool=nothing, ordered=false)
Construct a discrete univariate distribution whose finite support is the set of keys of the provided dictionary, prob_given_class
, and whose values specify the corresponding probabilities.
The type requirements on the keys of the dictionary are the same as the elements of support
given above with this exception: if non-categorical elements (raw labels) are used as keys, then pool=...
must be specified and cannot be missing
.
If the values (probabilities) are arrays instead of scalars, then an abstract array of UnivariateFinite
elements is created, with the same size as the array.
CategoricalDistributions.UnivariateFiniteArray
— TypeUnivariateFiniteArray
Array type whose elements are UnivariateFinite
distributions sharing a common sample space (CategoricalArrays
pool).
See UnivariateFinite
for constructor.
Base.isapprox
— Methodisapprox(d1::UnivariateFinite, d2::UnivariateFinite; kwargs...)
Returns true
if and only if d1
and d2
have the same support and the corresponding probabilities are approximately equal. The key-word arguments kwargs
are passed through to each call of isapprox
on probability pairs. Returns false
otherwise.
CategoricalDistributions._cumulative
— Method_cumulative(d::UnivariateFinite)
Return the cumulative probability vector C
for the distribution d
, using only classes in the support of d
, ordered according to the categorical elements used at instantiation of d
. Used only to implement random sampling from d
. We have C[1] == 0
and C[end] == 1
, assuming the probabilities have been normalized.
CategoricalDistributions._rand
— Methodrand(rng, pcumulative, R)
Randomly sample the distribution with discrete support R(1):R(n)
which has cumulative probability vector p_cumulative
(see _cummulative
).
CategoricalDistributions.classes
— Methodclasses(x)
Return, as a CategoricalVector
, all the categorical elements with the same pool as CategoricalValue
x
(including x
), with an ordering consistent with the pool. Note that x in classes(x)
is always true.
Not to be confused with levels(x.pool)
. See the example below.
Also, overloaded for x
a CategoricalArray
, CategoricalPool
, and for views of CategoricalArray
.
**Private method.*
julia> v = categorical([:c, :b, :c, :a])
4-element CategoricalArrays.CategoricalArray{Symbol,1,UInt32}:
:c
:b
:c
:a
julia> levels(v)
3-element Array{Symbol,1}:
:a
:b
:c
julia> x = v[4]
CategoricalArrays.CategoricalValue{Symbol,UInt32} :a
julia> classes(x)
3-element CategoricalArrays.CategoricalArray{Symbol,1,UInt32}:
:a
:b
:c
julia> levels(x.pool)
3-element Array{Symbol,1}:
:a
:b
:c
CategoricalDistributions.classes
— Methodclasses(d::UnivariateFinite)
classes(d::UnivariateFiniteArray)
A list of categorial elements in the common pool of classes used to construct d
.
v = categorical(["yes", "maybe", "no", "yes"])
d = UnivariateFinite(v[1:2], [0.3, 0.7])
classes(d) # CategoricalArray{String,1,UInt32}["maybe", "no", "yes"]
CategoricalDistributions.decoder
— Methodd = decoder(x)
A callable object for decoding the integer representation of a CategoricalValue
sharing the same pool as the CategoricalValue
x
. Specifically, one has d(int(y)) == y
for all y
in the same pool as x
. One can also call d
on integer arrays, in which case d
is broadcast over all elements.
julia> v = categorical([:c, :b, :c, :a])
julia> int(v)
4-element Array{UInt32,1}:
0x00000003
0x00000002
0x00000003
0x00000001
julia> d = decoder(v[3])
julia> d(int(v)) == v
true
Warning: There is no guarantee that int(d(u)) == u
will always holds.
See also: int
.
CategoricalDistributions.int
— Methodint(x)
The positional integer of the CategoricalValue
x
, in the ordering defined by the pool of x
. The type of int(x)
is the reference type of x
(which differentiates this method from CategoricalArrays.levelcode
).
int(X::CategoricalArray)
int(W::AbstractArray{<:CategoricalValue})
Broadcasted versions of int
.
julia> v = categorical([:c, :b, :c, :a])
julia> levels(v)
3-element Array{Symbol,1}:
:a
:b
:c
julia> int(v)
4-element Array{UInt32,1}:
0x00000003
0x00000002
0x00000003
0x00000001
See decoder
on how to invert the int
operation.
CategoricalDistributions.transform
— Methodtransform(e::Union{CategoricalElement,CategoricalArray,CategoricalPool}, X)
Transform the specified object X
into a categorical version, using the pool contained in e
. Here X
is a raw value (an element of levels(e)
) or an AbstractArray
of such values.
```julia v = categorical(["x", "y", "y", "x", "x"]) julia> transform(v, "x") CategoricalValue{String,UInt32} "x"
julia> transform(v[1], ["x" "x"; missing "y"]) 2×2 CategoricalArray{Union{Missing, Symbol},2,UInt32}: "x" "x" missing "y"
Private method.
DataAPI.levels
— Methodlevels(d::UnivariateFinite)
A list of the raw levels in the common pool of classes used to construct d
, equal to CategoricalArrays.DataAPI.unwrap.(classes(d))
.
v = categorical(["yes", "maybe", "no", "yes"])
d = UnivariateFinite(v[1:2], [0.3, 0.7])
levels(d) # Array{String, 1}["maybe", "no", "yes"]
Distributions.pdf
— MethodDist.pdf(d::UnivariateFinite, x)
Probability of d
at x
.
v = categorical(["yes", "maybe", "no", "yes"])
d = UnivariateFinite(v[1:2], [0.3, 0.7])
pdf(d, "yes") # 0.3
pdf(d, v[1]) # 0.3
pdf(d, "no") # 0.0
pdf(d, "house") # throws error
Other similar methods are available too:
mode(d) # CategoricalValue{String, UInt32} "maybe"
rand(d, 5) # CategoricalArray{String,1,UInt32}["maybe", "no", "maybe", "maybe", "no"] or similar
d = fit(UnivariateFinite, v)
pdf(d, "maybe") # 0.25
logpdf(d, "maybe") # log(0.25)
One can also do weighted fits:
w = [1, 4, 5, 1] # some weights
d = fit(UnivariateFinite, v, w)
pdf(d, "maybe") ≈ 4/11 # true
See also classes
, support
.
Distributions.support
— MethodDist.support(d::UnivariateFinite)
Dist.support(d::UnivariateFiniteArray)
Ordered list of classes associated with non-zero probabilities.
v = categorical(["yes", "maybe", "no", "yes"])
d = UnivariateFinite(v[1:2], [0.3, 0.7])
support(d) # CategoricalArray{String,1,UInt32}["maybe", "yes"]