AliasTables

AliasTables provides the AliasTable type, which is an object that defines a probability distribution over 1:n for some n. They are efficient to construct and very efficient to sample from.

Constructing an AliasTable

Construct an AliasTable by calling AliasTable(probabilities) for some collection of probabilities. For example, to create a table with a 30% chance of returning 1, and a 70% chance of returning 2, you would call AliasTable([0.3, 0.7]).

probabilities must be an abstract vector of real number. The sum need not be 1 as the input will be automatically normalized.

Sampling from an AliasTable

Sample from an AliasTable the same way you would sample from any sampleable object using the Random API. For example, to draw a single sample, call rand(at::AliasTable), to draw n samples, call rand(at::AliasTable, n), to sample using a specific random number generator, call rand(rng::Random.AbstractRNG, at::AliasTable), and to populate an existing array, call rand!(x, at::AliasTable).

Example

julia> using AliasTables

julia> at = AliasTable([5,10,1])
AliasTable([0x5000000000000000, 0xa000000000000000, 0x1000000000000000])

julia> rand(at, 8)
8-element Vector{Int64}:
 2
 2
 2
 2
 1
 1
 3
 2

julia> length(at)
3

julia> AliasTables.probabilities(float, at)
3-element Vector{Float64}:
 0.3125
 0.625
 0.0625

Implementation details

Alias tables are composed of a list of (acceptance probability, alias) pairs. To sample from an alias table, first pick an element (p, alias) from that list uniformly at random. Then, with probability p, return the index of that element and with probability 1-p, return alias. For more information, see the wikipedia article, or a publication by the original author Walker, A. J. "An Efficient Method for Generating Discrete Random Variables with General Distributions." ACM Transactions on Mathematical Software 3 (3): 253, 1977.


While this package does follow the general structure of the algorithms described in the above articles, it makes some departures for performance reasons.

Conventional alias tables map an integer in the range 1:n and a real number to a number in the range 1:n. This package's alias tables, however, map an integer in the range 0:2^k-1 to a number in the range 1:n. Where k = 64 by default. This results in increased precision and increased performance.

To apply this mapping to x using an alias table with 2^b elements, we use the most significant b bits of x to index into the table, retrieving a pair (p, alias). We then compare the least significant k bits of x to p, and if x's low bits are less than p, we return the aliased index, otherwise we return the index itself. Any finite probability distribution can be thought of as a distribution over 2^b elements by simply appending zeros to the end of the distribution.

This whole process uses integer arithmetic which allows both very fast sampling and exact construction.

We can count exactly how many inputs map to a given output as follows.

For a given output m ∈ 1:n, drawn from an AliasTable{UIntK} with a k-bit domain and a range of 1:2^b, the inputs that produce m come from two disjoint sets

  • The integers between m * 2^(k-b) + p, inclusive, and (m+1) * 2^(k-b), exclusive where p is the alias probability of the mth table entry. This set has 2^(k-b) - p elements.
  • The integers between j * 2^(k-b), inclusive and j + 2^(k-b) + p_j, exclusive for all j whose table entry aliases to m where p_j is alias probability of the jth table entry. This set has size sum(p_j for j in 1:n if alias_j == m).

The default constructors in this package utilize those formulae to produce alias tables that can exactly represent any distribution where all probabilities are of the form p/2^k for some integer p.

Alternate sampling API

You can bypass the Random API and sample directly from an AliasTable using the public AliasTables.sample function which is branchless, deterministic, and not pseudorandom. If given an input drawn uniformly at random from the domain of T, this method will produce a sample drawn from the distribution represented by at.

AliasTables.sampleFunction
sample(x::T, at::AliasTable{T, I}) -> I

Sample from at using the seed x.

If x is chosen uniformly at random from the set of all values representable by T then the output will be a random sample from the distribution represented by at. The mapping is deterministic and not pseudo-random so for patterned input x the output will be patterned as well.

See also AliasTable, AliasTables.probabilities

Performance characteristics

Constructing an AliasTable{T} is O(n) in the number of elements in the input collection, with a low constant factor. Sampling itself is O(1) with a very low constant factor. It is branchless, involves one random array read, and takes about 20 instructions more than rand(T).

julia> using Chairmarks

julia> @b rand(1000) AliasTable
13.250 μs (5 allocs: 23.906 KiB)

julia> @b AliasTable(rand(1000)) rand
3.059 ns

julia> @b rand(UInt64)
2.891 ns

julia> @b AliasTable(rand(1000)) rand(_, 1000)
1.588 μs (3 allocs: 7.875 KiB)

julia> @b rand(UInt64, 1000)
606.870 ns (3 allocs: 7.875 KiB)

Bulk generation of UInt64 has hand written llvm instructions to support SIMD, while alias tables don't SIMD as nicely and have not been as aggressively optimized; hence the difference in bulk generation time while scalar generation time is similar.

Docstrings

The docstring for the AliasTable constructor defines the API for constructing AliasTables, the AliasTables.probabilities function allows recovery of the exact sampling probabilities provided by an AliasTable, and the AliasTables.sample function provides an alternative API for sampling from AliasTables.

However, the primary sampling API is the Random API. AliasTables may be used as a sampling domain according to the specifications layed out by the Random stdlib.

AliasTables.AliasTableType
AliasTable{T<:Unsigned=UInt, I<:Integer=Int}(weights::AbstractVector{<:Real})

An efficient data structure for sampling from a discrete distribution.

Maps every value representable by T to a value of type I in eachindex(wights) such that the number of values maped to a given index of weights is proportional to the value at that index.

The mapping can be accessed directly via AliasTables.sample(x::T, at::AliasTable{T, I}) or indirectly via the Random API: rand(at), rand(rng, at), rand(at, dims...), etc.

Example

julia> at = AliasTable([1, 3, 1])
AliasTable([0x3333333333333334, 0x9999999999999999, 0x3333333333333333])

julia> rand(at, 5)
5-element Vector{Int64}:
 2
 3
 2
 2
 3
AliasTables.sampleFunction
sample(x::T, at::AliasTable{T, I}) -> I

Sample from at using the seed x.

If x is chosen uniformly at random from the set of all values representable by T then the output will be a random sample from the distribution represented by at. The mapping is deterministic and not pseudo-random so for patterned input x the output will be patterned as well.

See also AliasTable, AliasTables.probabilities

AliasTables.probabilitiesFunction
probabilities(at::AliasTable{T}) -> Vector{T}

Recover the exact sampling weights from a given AliasTable. The returned values will sum to one more than typemax(T), unless at is a constant distribution (e.g. AliasTable([0,1,0])), in which case the weights will sum to typemax(T).

See also AliasTable, AliasTables.sample

Examples

julia> at = AliasTable([1, 3, 1])
AliasTable([0x3333333333333334, 0x9999999999999999, 0x3333333333333333])

julia> AliasTables.probabilities(at)
3-element Vector{UInt64}:
 0x3333333333333334
 0x9999999999999999
 0x3333333333333333

julia> AliasTables.probabilities(AliasTable([0, 1, 0]))
3-element Vector{UInt64}:
 0x0000000000000000
 0xffffffffffffffff
 0x0000000000000000
probabilities(float, at::AliasTable{T}) -> Vector{<:AbstractFloat}

Return the sampling probabilities of at. The returned vector will sum to 1.0, up to rounding error.

Example

```jldoctest julia> AliasTables.probabilities(float, AliasTable([1, 3, 1])) 3-element Vector{Float64}: 0.2 0.6 0.2

Base.lengthMethod
length(at::AliasTable)

Get the number of weights that at was constructed with, including trailing zeros.

Example

julia> length(AliasTable([1, 3, 0]))
3