AliasTables
AliasTables provides the AliasTable
type, which is an object that defines a probability distribution over 1:n
for some n
. They are efficient to construct and very efficient to sample from.
Constructing an AliasTable
Construct an AliasTable by calling AliasTable(probabilities)
for some collection of probabilities. For example, to create a table with a 30% chance of returning 1, and a 70% chance of returning 2, you would call AliasTable([0.3, 0.7])
.
probabilities
must be an abstract vector of real numbers. The sum need not be 1 as the input will be automatically normalized.
See AliasTables.set_weights!
for a way to change the weights of an existing alias table without GC-managed allocations.
Sampling from an AliasTable
Sample from an AliasTable
the same way you would sample from any sampleable object using the Random API. For example, to draw a single sample, call rand(at::AliasTable)
, to draw n
samples, call rand(at::AliasTable, n)
, to sample using a specific random number generator, call rand(rng::Random.AbstractRNG, at::AliasTable)
, and to populate an existing array, call rand!(x, at::AliasTable)
.
Example
julia> using AliasTables
julia> at = AliasTable([5,10,1])
AliasTable([0x5000000000000000, 0xa000000000000000, 0x1000000000000000])
julia> rand(at, 8)
8-element Vector{Int64}:
2
2
1
2
3
1
2
2
julia> length(at)
3
julia> AliasTables.probabilities(float, at)
3-element Vector{Float64}:
0.3125
0.625
0.0625
Implementation details
Alias tables are composed of a list of (acceptance probability, alias) pairs. To sample from an alias table, first pick an element (p, alias)
from that list uniformly at random. Then, with probability p
, return the index of that element and with probability 1-p
, return alias
. For more information, see the wikipedia article, or a publication by the original author Walker, A. J. "An Efficient Method for Generating Discrete Random Variables with General Distributions." ACM Transactions on Mathematical Software 3 (3): 253, 1977.
While this package does follow the general structure of the algorithms described in the above articles, it makes some departures for performance reasons.
Conventional alias tables map an integer in the range 1:n
and a real number to a number in the range 1:n
. This package's alias tables, however, map an integer in the range 0:2^k-1
to a number in the range 1:n
. Where k = 64
by default. This results in increased precision and increased performance.
To apply this mapping to x
using an alias table with 2^b
elements, we use the most significant b
bits of x
to index into the table, retrieving a pair (p, alias)
. We then compare the least significant k
bits of x
to p
, and if x
's low bits are less than p
, we return the aliased index, otherwise we return the index itself. Any finite probability distribution can be thought of as a distribution over 2^b
elements by simply appending zeros to the end of the distribution.
This whole process uses integer arithmetic which allows both very fast sampling and exact construction.
We can count exactly how many inputs map to a given output as follows.
For a given output m ∈ 1:n
, drawn from an AliasTable{UIntK}
with a k
-bit domain and a range of 1:2^b
, the inputs that produce m
come from two disjoint sets
- The integers between
m * 2^(k-b) + p
, inclusive, and(m+1) * 2^(k-b)
, exclusive wherep
is the alias probability of them
th table entry. This set has2^(k-b) - p
elements. - The integers between
j * 2^(k-b)
, inclusive andj + 2^(k-b) + p_j
, exclusive for allj
whose table entry aliases tom
wherep_j
is alias probability of thej
th table entry. This set has sizesum(p_j for j in 1:n if alias_j == m)
.
The default constructors in this package utilize those formulae to produce alias tables that can exactly represent any distribution where all probabilities are of the form p/2^k
for some integer p
.
Alternate sampling API
You can bypass the Random API and sample directly from an alias table at::AliasTable{T}
using the public AliasTables.sample
function which is branchless, deterministic, and not pseudorandom. If given an input drawn uniformly at random from the domain of T
, this method will produce a sample drawn from the distribution represented by at
.
AliasTables.sample
— Functionsample(x::T, at::AliasTable{T, I}) -> I
Sample from at
using the seed x
.
If x
is chosen uniformly at random from the set of all values representable by T
then the output will be a random sample from the distribution represented by at
. The mapping is deterministic and not pseudo-random so for patterned input x
the output will be patterned as well.
See also AliasTable
, AliasTables.probabilities
Performance characteristics
Constructing an AliasTable{T}
is O(n) in the number of elements in the input collection, with a low constant factor. Sampling itself is O(1) with a very low constant factor. It is branchless, involves one random array read, and takes about 20 instructions more than rand(T)
.
julia> using Chairmarks
julia> @b rand(1000) AliasTable
13.250 μs (5 allocs: 23.906 KiB)
julia> @b AliasTable(rand(1000)) rand
3.059 ns
julia> @b rand(UInt64)
2.891 ns
julia> @b AliasTable(rand(1000)) rand(_, 1000)
1.588 μs (3 allocs: 7.875 KiB)
julia> @b rand(UInt64, 1000)
606.870 ns (3 allocs: 7.875 KiB)
Bulk generation of UInt64 has hand written llvm instructions to support SIMD, while alias tables don't SIMD as nicely and have not been as aggressively optimized; hence the difference in bulk generation time while scalar generation time is similar.
Docstrings
The docstring for the AliasTable
constructor defines the API for constructing AliasTable
s, the AliasTables.probabilities
function allows recovery of the exact sampling probabilities provided by an AliasTable
, and the AliasTables.sample
function provides an alternative API for sampling from AliasTable
s.
However, the primary sampling API is the Random API. AliasTable
s may be used as a sampling domain according to the specifications layed out by the Random stdlib.
AliasTables.AliasTable
— TypeAliasTable{T<:Unsigned=UInt64, I<:Integer=Int}(weights::AbstractVector{<:Real})
An efficient data structure for sampling from a discrete distribution.
Maps every value representable by T
to a value of type I
in eachindex(wights)
such that the number of values maped to a given index of weights
is proportional to the value at that index.
The mapping can be accessed directly via AliasTables.sample(x::T, at::AliasTable{T, I})
or indirectly via the Random
API: rand(at)
, rand(rng, at)
, rand(at, dims...)
, etc.
See also AliasTables.set_weights!
Example
julia> at = AliasTable([1, 3, 1])
AliasTable([0x3333333333333334, 0x9999999999999999, 0x3333333333333333])
julia> rand(at, 5)
5-element Vector{Int64}:
2
3
2
2
3
AliasTables.sample
— Functionsample(x::T, at::AliasTable{T, I}) -> I
Sample from at
using the seed x
.
If x
is chosen uniformly at random from the set of all values representable by T
then the output will be a random sample from the distribution represented by at
. The mapping is deterministic and not pseudo-random so for patterned input x
the output will be patterned as well.
See also AliasTable
, AliasTables.probabilities
AliasTables.probabilities
— Functionprobabilities(at::AliasTable{T}) -> Vector{T}
Recover the exact sampling weights from a given AliasTable
. The returned values will sum to one more than typemax(T)
, unless at
is a constant distribution (e.g. AliasTable([0,1,0])
), in which case the weights will sum to typemax(T)
.
See also AliasTable
, AliasTables.sample
Examples
julia> at = AliasTable([1, 3, 1])
AliasTable([0x3333333333333334, 0x9999999999999999, 0x3333333333333333])
julia> AliasTables.probabilities(at)
3-element Vector{UInt64}:
0x3333333333333334
0x9999999999999999
0x3333333333333333
julia> AliasTables.probabilities(AliasTable([0, 1, 0]))
3-element Vector{UInt64}:
0x0000000000000000
0xffffffffffffffff
0x0000000000000000
probabilities(float, at::AliasTable{T}) -> Vector{<:AbstractFloat}
Return the sampling probabilities of at
. The returned vector will sum to 1.0, up to rounding error.
Example
julia> AliasTables.probabilities(float, AliasTable([1, 3, 1]))
3-element Vector{Float64}:
0.2
0.6
0.2
AliasTables.set_weights!
— Functionset_weights!(at::AliasTable, weights::AbstractVector{<:Real})
Set the weights of at
to weights
and return at
.
Does not perform GC managed allocations.
Example
julia> at = AliasTable([1, 3, 1])
AliasTable([0x3333333333333334, 0x9999999999999999, 0x3333333333333333])
julia> at === AliasTables.set_weights!(at, [1, 2, 1])
true
julia> at
AliasTable([0x4000000000000000, 0x8000000000000000, 0x4000000000000000])
Base.length
— Methodlength(at::AliasTable)
Get the number of weights that at
was constructed with, including trailing zeros.
Example
julia> length(AliasTable([1, 3, 0]))
3