ARDESPOT.DESPOTSolver
— TypeDESPOTSolver(<keyword arguments>)
Implementation of the ARDESPOT solver trying to closely match the pseudo code of:
http://bigbird.comp.nus.edu.sg/m2ap/wordpress/wp-content/uploads/2017/08/jair14.pdf
Each field may be set via keyword argument. The fields that correspond to algorithm parameters match the definitions in the paper exactly.
Fields
epsilon_0
xi
K
D
lambda
T_max
max_trials
bounds
default_action
rng
random_source
bounds_warnings
tree_in_info
Further information can be found in the field docstrings (e.g. ?DESPOTSolver.xi
)
ARDESPOT.DefaultPolicyLB
— TypeDefaultPolicyLB(policy; max_depth=nothing, final_value=(m,x)->0.0)
DefaultPolicyLB(solver; max_depth=nothing, final_value=(m,x)->0.0)
A lower bound calculated by running a default policy on the scenarios in a belief.
Keyword Arguments
max_depth::Union{Nothing,Int}=nothing
: max depth to run the simulation. The depth of the belief will be automatically subtracted so simulations for the bound will be run formax_depth-b.depth
steps. Ifnothing
, the solver's max depth will be used.final_value=(m,x)->0.0
: a function (or callable object) that specifies an additional value to be added at the end of the simulation whenmax_depth
is reached. This function will be called with two arguments, aPOMDP
, and aScenarioBelief
. It will not be called when the states in the belief are terminal.
ARDESPOT.IndependentBounds
— TypeIndependentBounds(lower, upper, check_terminal=false, consistency_fix_thresh=0.0)
Specify lower and upper bounds that are independent of each other (the most common case).
Keyword Arguments
check_terminal::Bool=false
: if true, then if all the states in the belief are terminal, the upper and lower bounds will be overridden and set to 0.consistency_fix_thresh::Float64=0.0
: ifupper < lower
andupper >= lower-consistency_fix_thresh
, thenupper
will be bumped up tolower
.
ARDESPOT.calc_L
— MethodReturn a vector of lower bounds L of length lenb+lenba, with b nodes first followed by ba nodes.
ARDESPOT.fill_L!
— MethodFill all the elements of the cache for b and children of b and return L[b]