FactoredValueMCTS
FactoredValueMCTS.FVMCTSSolver
FactoredValueMCTS.FactoredRandomPolicy
FactoredValueMCTS.MaxPlusStatistics
FactoredValueMCTS.VarElStatistics
FactoredValueMCTS.coordinate_action
FactoredValueMCTS.coordinate_action
FactoredValueMCTS.maxplus_joint_mcts_planner
FactoredValueMCTS.update_statistics!
FactoredValueMCTS.update_statistics!
FactoredValueMCTS.varel_joint_mcts_planner
FactoredValueMCTS.FVMCTSSolver
— TypeFactored Value Monte Carlo Tree Search solver datastructure
Fields: n_iterations::Int64 Number of iterations during each action() call. default: 100
max_time::Float64
Maximum CPU time to spend computing an action.
default::Inf
depth::Int64
Number of iterations during each action() call.
default: 100
exploration_constant::Float64:
Specifies how much the solver should explore. In the UCB equation, Q + c*sqrt(log(t/N)), c is the exploration constant.
The exploration terms for FV-MCTS-Var-El and FV-MCTS-Max-Plus are different but the role of c is the same.
default: 1.0
rng::AbstractRNG:
Random number generator
estimate_value::Any (rollout policy)
Function, object, or number used to estimate the value at the leaf nodes.
If this is a function `f`, `f(mdp, s, depth)` will be called to estimate the value.
If this is an object `o`, `estimate_value(o, mdp, s, depth)` will be called.
If this is a number, the value will be set to that number
default: RolloutEstimator(RandomSolver(rng))
init_Q::Any
Function, object, or number used to set the initial Q(s,a) value at a new node.
If this is a function `f`, `f(mdp, s, a)` will be called to set the value.
If this is an object `o`, `init_Q(o, mdp, s, a)` will be called.
If this is a number, Q will be set to that number
default: 0.0
init_N::Any
Function, object, or number used to set the initial N(s,a) value at a new node.
If this is a function `f`, `f(mdp, s, a)` will be called to set the value.
If this is an object `o`, `init_N(o, mdp, s, a)` will be called.
If this is a number, N will be set to that number
default: 0
reuse_tree::Bool
If this is true, the tree information is re-used for calculating the next plan.
Of course, clear_tree! can always be called to override this.
default: false
coordination_strategy::AbstractCoordinationStrategy
The specific strategy with which to compute the best joint action from the current MCTS statistics.
default: VarEl()
FactoredValueMCTS.FactoredRandomPolicy
— TypeRandom Policy factored for each agent. Avoids exploding action space.
FactoredValueMCTS.MaxPlusStatistics
— TypeTracks the specific informations and statistics we need to use Max-Plus to coordinateaction the joint action in Factored-Value MCTS. Putting parameters here is a little ugly but coordinateaction can't have them since VarEl doesn't use those args.
Fields: adjmatgraph::SimpleGraph The coordination graph as a Graphs SimpleGraph.
message_iters::Int64
Number of rounds of message passing.
message_norm::Bool
Whether to normalize the messages or not after message passing.
use_agent_utils::Bool
Whether to include the per-agent utilities while computing the best agent action (see our paper for details)
node_exploration::Bool
Whether to use the per-node UCB style bonus while computing the best agent action (see our paper for details)
edge_exploration::Bool
Whether to use the per-edge UCB style bonus after the message passing rounds (see our paper for details). One of this or node_exploration MUST be true for exploration.
all_states_stats::Dict{AbstractVector{S},PerStateMPStats}
Maps each joint state in the tree to the per-state statistics.
FactoredValueMCTS.VarElStatistics
— TypeTracks the specific informations and statistics we need to use Var-El to coordinate_action the joint action in Factored-Value MCTS.
Fields: coordgraphcomponents::Vector{Vector{Int64}} The list of coordination graph components, i.e., cliques, where each element is a list of agent IDs that are in a mutual clique.
min_degree_ordering::Vector{Int64}
Ordering of agent IDs in increasing CG degree. This ordering is the heuristic most typically used for the elimination order in Var-El.
n_component_stats::Dict{AbstractVector{S},Vector{Vector{Int64}}}
Maps each joint state in the tree (for which we need to compute the UCB action) to the frequency of each component's various local actions.
q_component_stats::Dict{AbstractVector{S},Vector{Vector{Float64}}}
Maps each joint state in the tree to the accumulated q-value of each component's various local actions.
FactoredValueMCTS.coordinate_action
— MethodRuns Max-Plus at the current state using the per-state MaxPlusStatistics to compute the best joint action with either or both of node-wise and edge-wise exploration bonus. Rounds of message passing are followed by per-node maximization.
FactoredValueMCTS.coordinate_action
— MethodRuns variable elimination at the current state using the VarEl Statistics to compute the best joint action with the component-wise exploration bonus. FYI: Rather complicated.
FactoredValueMCTS.maxplus_joint_mcts_planner
— MethodCalled internally in solve() to create the FVMCTSPlanner where Max-Plus is the specific action coordination strategy. Creates MaxPlusStatistics and assumes the various MP flags are sent down from the CoordinationStrategy object given to the solver.
FactoredValueMCTS.update_statistics!
— MethodTake the q-value from the MCTS step and distribute the updates across the per-node and per-edge q-stats as per the formula in our paper.
FactoredValueMCTS.update_statistics!
— MethodTake the q-value from the MCTS step and distribute the updates across the component q-stats as per the formula in the Amato-Oliehoek paper.
FactoredValueMCTS.varel_joint_mcts_planner
— MethodCalled internally in solve() to create the FVMCTSPlanner where Var-El is the specific action coordination strategy. Creates VarElStatistics internally with the CG components and the minimum degree ordering heuristic.