# Samplers

## Defining a Sampler

In this section, we outline the requirements and guidelines for defining a belief Sampler.

### Interface

The Sampler interface only has one method: the functor. For example, if you wanted to implement your own Sampler, you could write something like this

struct MySampler <: Compressor
foo
bar
end

# functor definition
function (c::MySampler)(pomdp::POMDP)
# YOUR CODE HERE
return sampled_beliefs
end

## Implemented Sampler

CompressedBeliefMDPs provides the following generic belief samplers:

### Exploratory Belief Expansion

CompressedBeliefMDPs.BeliefExpansionSamplerType
BeliefExpansionSampler

Fast extension of exploratory belief expansion (Algorithm 21.13 in Algorithms for Decision Making) that uses $k$-d trees.

Fields

• updater::Updater: The updater used to update beliefs.
• metric::NearestNeighbors.MinkowskiMetric: The metric used to measure distances between beliefs.

It must be a Minkowski metric.

• n::Integer: The number of belief expansions to perform.

Constructors

BeliefExpansionSampler(pomdp::POMDP; updater::Updater=DiscreteUpdater(pomdp),
metric::NearestNeighbors.MinkowskiMetric=Euclidean(), n::Integer=3)

Methods

(s::BeliefExpansionSampler)(pomdp::POMDP)

Creates an initial belief and performs exploratory belief expansion. Returns the unique belief states. Only works for POMDPs with discrete state, action, and observation spaces.

Example Usage

julia> pomdp = TigerPOMDP();
julia> sampler = BeliefExpansionSampler(pomdp; n=2);
julia> beliefs = sampler(pomdp)
Set{DiscreteBelief{TigerPOMDP, Bool}} with 4 elements:
DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.15000000000000002, 0.85])
DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.5, 0.5])
DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.85, 0.15000000000000002])
DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.9697986577181208, 0.030201342281879207])

### Policy Sampler

CompressedBeliefMDPs.PolicySamplerType
PolicySampler

Samples belief states by rolling out a Policy.

Fields

• policy::Policy: The policy used for decision making.
• updater::Updater: The updater used for updating beliefs.
• n::Integer: The maximum number of simulated steps.
• rng::AbstractRNG: The random number generator used for sampling.
• verbose::Bool: Whether to use a progress bar while sampling.

Constructors

PolicySampler(pomdp::POMDP; policy::Policy=RandomPolicy(pomdp),
updater::Updater=DiscreteUpdater(pomdp), n::Integer=10,
rng::AbstractRNG=Random.GLOBAL_RNG)

Methods

(s::PolicySampler)(pomdp::POMDP)

Returns a vector of unique belief states.

Example

julia> pomdp = TigerPOMDP();
julia> sampler = PolicySampler(pomdp; n=3);
julia> 2-element Vector{Any}:
DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.5, 0.5])
DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.15000000000000002, 0.85])

### ExplorationPolicy Sampler

CompressedBeliefMDPs.ExplorationPolicySamplerType
ExplorationPolicySampler

Samples belief states by rolling out an ExplorationPolicy. Essentially identical to PolicySampler.

Fields

• explorer::ExplorationPolicy: The ExplorationPolicy used for decision making.
• on_policy::Policy: The fallback Policy used for decision making when not exploring.
• updater::Updater: The updater used for updating beliefs.
• n::Integer: The maximum number of simulated steps.
• rng::AbstractRNG: The random number generator used for sampling.
• verbose::Bool: Whether to use a progress bar while sampling.

Constructors

ExplorationPolicySampler(pomdp::POMDP; rng::AbstractRNG=Random.GLOBAL_RNG,
explorer::ExplorationPolicy=EpsGreedyPolicy(pomdp, 0.1; rng=rng), on_policy=RandomPolicy(pomdp),
updater::Updater=DiscreteUpdater(pomdp), n::Integer=10)

Methods

(s::ExplorationPolicySampler)(pomdp::POMDP)

Returns a vector of unique belief states.

Example Usage

julia> pomdp = TigerPOMDP()
julia> sampler = ExplorationPolicySampler(pomdp; n=30)
julia> sampler(pomdp)
3-element Vector{Any}:
DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.5, 0.5])
DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.85, 0.15000000000000002])
DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.9697986577181208, 0.030201342281879207])