FactoredValueMCTS

FactoredValueMCTS.FVMCTSSolverType

Factored Value Monte Carlo Tree Search solver datastructure

Fields: n_iterations::Int64 Number of iterations during each action() call. default: 100

max_time::Float64
    Maximum CPU time to spend computing an action.
    default::Inf

depth::Int64
    Number of iterations during each action() call.
    default: 100

exploration_constant::Float64:
    Specifies how much the solver should explore. In the UCB equation, Q + c*sqrt(log(t/N)), c is the exploration constant.
    The exploration terms for FV-MCTS-Var-El and FV-MCTS-Max-Plus are different but the role of c is the same.
    default: 1.0

rng::AbstractRNG:
    Random number generator

estimate_value::Any (rollout policy)
    Function, object, or number used to estimate the value at the leaf nodes.
    If this is a function `f`, `f(mdp, s, depth)` will be called to estimate the value.
    If this is an object `o`, `estimate_value(o, mdp, s, depth)` will be called.
    If this is a number, the value will be set to that number
    default: RolloutEstimator(RandomSolver(rng))

init_Q::Any
    Function, object, or number used to set the initial Q(s,a) value at a new node.
    If this is a function `f`, `f(mdp, s, a)` will be called to set the value.
    If this is an object `o`, `init_Q(o, mdp, s, a)` will be called.
    If this is a number, Q will be set to that number
    default: 0.0

init_N::Any
    Function, object, or number used to set the initial N(s,a) value at a new node.
    If this is a function `f`, `f(mdp, s, a)` will be called to set the value.
    If this is an object `o`, `init_N(o, mdp, s, a)` will be called.
    If this is a number, N will be set to that number
    default: 0

reuse_tree::Bool
    If this is true, the tree information is re-used for calculating the next plan.
    Of course, clear_tree! can always be called to override this.
    default: false

coordination_strategy::AbstractCoordinationStrategy
    The specific strategy with which to compute the best joint action from the current MCTS statistics.
    default: VarEl()
FactoredValueMCTS.MaxPlusStatisticsType

Tracks the specific informations and statistics we need to use Max-Plus to coordinateaction the joint action in Factored-Value MCTS. Putting parameters here is a little ugly but coordinateaction can't have them since VarEl doesn't use those args.

Fields: adjmatgraph::SimpleGraph The coordination graph as a Graphs SimpleGraph.

message_iters::Int64
    Number of rounds of message passing.

message_norm::Bool
    Whether to normalize the messages or not after message passing.

use_agent_utils::Bool
    Whether to include the per-agent utilities while computing the best agent action (see our paper for details)

node_exploration::Bool
    Whether to use the per-node UCB style bonus while computing the best agent action (see our paper for details)

edge_exploration::Bool
    Whether to use the per-edge UCB style bonus after the message passing rounds (see our paper for details). One of this or node_exploration MUST be true for exploration.

all_states_stats::Dict{AbstractVector{S},PerStateMPStats}
    Maps each joint state in the tree to the per-state statistics.
FactoredValueMCTS.VarElStatisticsType

Tracks the specific informations and statistics we need to use Var-El to coordinate_action the joint action in Factored-Value MCTS.

Fields: coordgraphcomponents::Vector{Vector{Int64}} The list of coordination graph components, i.e., cliques, where each element is a list of agent IDs that are in a mutual clique.

min_degree_ordering::Vector{Int64}
    Ordering of agent IDs in increasing CG degree. This ordering is the heuristic most typically used for the elimination order in Var-El.

n_component_stats::Dict{AbstractVector{S},Vector{Vector{Int64}}}
    Maps each joint state in the tree (for which we need to compute the UCB action) to the frequency of each component's various local actions.

q_component_stats::Dict{AbstractVector{S},Vector{Vector{Float64}}}
    Maps each joint state in the tree to the accumulated q-value of each component's various local actions.
FactoredValueMCTS.coordinate_actionMethod

Runs Max-Plus at the current state using the per-state MaxPlusStatistics to compute the best joint action with either or both of node-wise and edge-wise exploration bonus. Rounds of message passing are followed by per-node maximization.

FactoredValueMCTS.coordinate_actionMethod

Runs variable elimination at the current state using the VarEl Statistics to compute the best joint action with the component-wise exploration bonus. FYI: Rather complicated.

FactoredValueMCTS.maxplus_joint_mcts_plannerMethod

Called internally in solve() to create the FVMCTSPlanner where Max-Plus is the specific action coordination strategy. Creates MaxPlusStatistics and assumes the various MP flags are sent down from the CoordinationStrategy object given to the solver.

FactoredValueMCTS.update_statistics!Method

Take the q-value from the MCTS step and distribute the updates across the per-node and per-edge q-stats as per the formula in our paper.

FactoredValueMCTS.update_statistics!Method

Take the q-value from the MCTS step and distribute the updates across the component q-stats as per the formula in the Amato-Oliehoek paper.

FactoredValueMCTS.varel_joint_mcts_plannerMethod

Called internally in solve() to create the FVMCTSPlanner where Var-El is the specific action coordination strategy. Creates VarElStatistics internally with the CG components and the minimum degree ordering heuristic.