POMCPSolver(#=keyword arguments=#)

Partially Observable Monte Carlo Planning Solver.

Keyword Arguments

  • max_depth::Int Rollouts and tree expension will stop when this depth is reached. default: 20

  • c::Float64 UCB exploration constant - specifies how much the solver should explore. default: 1.0

  • tree_queries::Int Number of iterations during each action() call. default: 1000

  • max_time::Float64 Maximum time for planning in each action() call. default: Inf

  • tree_in_info::Bool If true, returns the tree in the info dict when action_info is called. default: false

  • estimate_value::Any Function, object, or number used to estimate the value at the leaf nodes. default: RolloutEstimator(RandomSolver(rng))

    • If this is a function f, f(pomdp, s, h::BeliefNode, steps) will be called to estimate the value.
    • If this is an object o, estimate_value(o, pomdp, s, h::BeliefNode, steps) will be called.
    • If this is a number, the value will be set to that number

    Note: In many cases, the simplest way to estimate the value is to do a rollout on the fully observable MDP with a policy that is a function of the state. To do this, use FORollout(policy).

  • default_action::Any Function, action, or Policy used to determine the action if POMCP fails with exception ex. default: ExceptionRethrow()

    • If this is a Function f, f(pomdp, belief, ex) will be called.
    • If this is a Policy p, action(p, belief) will be called.
    • If it is an object a, default_action(a, pomdp, belief, ex) will be called, and if this method is not implemented, a will be returned directly.
  • rng::AbstractRNG Random number generator. default: Random.GLOBAL_RNG

extract_belief(rollout_updater::POMDPs.Updater, node::BeliefNode)

Return a belief compatible with the rollout_updater from the belief in node.

When a rollout simulation is started, this function is used to create the initial belief (compatible with rollout_updater) based on the appropriate BeliefNode at the edge of the tree. By overriding this, a belief can be constructed based on the entire tree or entire observation-action history.

estimate_value(estimator, problem::POMDPs.POMDP, start_state, h::BeliefNode, steps::Int)

Return an initial unbiased estimate of the value at belief node h.

By default this runs a rollout simulation