Samplers

Defining a Sampler

In this section, we outline the requirements and guidelines for defining a belief Sampler.

Interface

The Sampler interface only has one method: the functor. For example, if you wanted to implement your own Sampler, you could write something like this

struct MySampler <: Compressor
    foo
    bar
end

# functor definition
function (c::MySampler)(pomdp::POMDP)
    # YOUR CODE HERE
    return sampled_beliefs
end

Implemented Sampler

CompressedBeliefMDPs provides the following generic belief samplers:

an exploratory belief expansion sampler
a Policy rollout sampler
an ExplorationPolicy rollout sampler

Exploratory Belief Expansion

CompressedBeliefMDPs.BeliefExpansionSampler — Type

BeliefExpansionSampler

Fast extension of exploratory belief expansion (Algorithm 21.13 in Algorithms for Decision Making) that uses $k$-d trees.

Fields

updater::Updater: The updater used to update beliefs.
metric::NearestNeighbors.MinkowskiMetric: The metric used to measure distances between beliefs.

It must be a Minkowski metric.

n::Integer: The number of belief expansions to perform.

Constructors

BeliefExpansionSampler(pomdp::POMDP; updater::Updater=DiscreteUpdater(pomdp),
metric::NearestNeighbors.MinkowskiMetric=Euclidean(), n::Integer=3)

Methods

(s::BeliefExpansionSampler)(pomdp::POMDP)

Creates an initial belief and performs exploratory belief expansion. Returns the unique belief states. Only works for POMDPs with discrete state, action, and observation spaces.

Example Usage

julia> pomdp = TigerPOMDP();
julia> sampler = BeliefExpansionSampler(pomdp; n=2);
julia> beliefs = sampler(pomdp)
Set{DiscreteBelief{TigerPOMDP, Bool}} with 4 elements:
  DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.15000000000000002, 0.85])
  DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.5, 0.5])
  DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.85, 0.15000000000000002])
  DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.9697986577181208, 0.030201342281879207])

Policy Sampler

CompressedBeliefMDPs.PolicySampler — Type

PolicySampler

Samples belief states by rolling out a Policy.

Fields

policy::Policy: The policy used for decision making.
updater::Updater: The updater used for updating beliefs.
n::Integer: The maximum number of simulated steps.
rng::AbstractRNG: The random number generator used for sampling.
verbose::Bool: Whether to use a progress bar while sampling.

Constructors

PolicySampler(pomdp::POMDP; policy::Policy=RandomPolicy(pomdp), 
updater::Updater=DiscreteUpdater(pomdp), n::Integer=10, 
rng::AbstractRNG=Random.GLOBAL_RNG)

Methods

(s::PolicySampler)(pomdp::POMDP)

Returns a vector of unique belief states.

Example

julia> pomdp = TigerPOMDP();
julia> sampler = PolicySampler(pomdp; n=3); 
julia> 2-element Vector{Any}:
DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.5, 0.5])
DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.15000000000000002, 0.85])

ExplorationPolicy Sampler

CompressedBeliefMDPs.ExplorationPolicySampler — Type

ExplorationPolicySampler

Samples belief states by rolling out an ExplorationPolicy. Essentially identical to PolicySampler.

Fields

explorer::ExplorationPolicy: The ExplorationPolicy used for decision making.
on_policy::Policy: The fallback Policy used for decision making when not exploring.
updater::Updater: The updater used for updating beliefs.
n::Integer: The maximum number of simulated steps.
rng::AbstractRNG: The random number generator used for sampling.
verbose::Bool: Whether to use a progress bar while sampling.

Constructors

ExplorationPolicySampler(pomdp::POMDP; rng::AbstractRNG=Random.GLOBAL_RNG,
explorer::ExplorationPolicy=EpsGreedyPolicy(pomdp, 0.1; rng=rng), on_policy=RandomPolicy(pomdp),
updater::Updater=DiscreteUpdater(pomdp), n::Integer=10)

Methods

(s::ExplorationPolicySampler)(pomdp::POMDP)

Returns a vector of unique belief states.

Example Usage

julia> pomdp = TigerPOMDP()
julia> sampler = ExplorationPolicySampler(pomdp; n=30)
julia> sampler(pomdp)
3-element Vector{Any}:
 DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.5, 0.5])
 DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.85, 0.15000000000000002])
 DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.9697986577181208, 0.030201342281879207])