

Factored Value Monte Carlo Tree Search solver datastructure

Fields: n_iterations::Int64 Number of iterations during each action() call. default: 100

    Maximum CPU time to spend computing an action.

    Number of iterations during each action() call.
    default: 100

    Specifies how much the solver should explore. In the UCB equation, Q + c*sqrt(log(t/N)), c is the exploration constant.
    The exploration terms for FV-MCTS-Var-El and FV-MCTS-Max-Plus are different but the role of c is the same.
    default: 1.0

    Random number generator

estimate_value::Any (rollout policy)
    Function, object, or number used to estimate the value at the leaf nodes.
    If this is a function `f`, `f(mdp, s, depth)` will be called to estimate the value.
    If this is an object `o`, `estimate_value(o, mdp, s, depth)` will be called.
    If this is a number, the value will be set to that number
    default: RolloutEstimator(RandomSolver(rng))

    Function, object, or number used to set the initial Q(s,a) value at a new node.
    If this is a function `f`, `f(mdp, s, a)` will be called to set the value.
    If this is an object `o`, `init_Q(o, mdp, s, a)` will be called.
    If this is a number, Q will be set to that number
    default: 0.0

    Function, object, or number used to set the initial N(s,a) value at a new node.
    If this is a function `f`, `f(mdp, s, a)` will be called to set the value.
    If this is an object `o`, `init_N(o, mdp, s, a)` will be called.
    If this is a number, N will be set to that number
    default: 0

    If this is true, the tree information is re-used for calculating the next plan.
    Of course, clear_tree! can always be called to override this.
    default: false

    The specific strategy with which to compute the best joint action from the current MCTS statistics.
    default: VarEl()

Tracks the specific informations and statistics we need to use Max-Plus to coordinateaction the joint action in Factored-Value MCTS. Putting parameters here is a little ugly but coordinateaction can't have them since VarEl doesn't use those args.

Fields: adjmatgraph::SimpleGraph The coordination graph as a Graphs SimpleGraph.

    Number of rounds of message passing.

    Whether to normalize the messages or not after message passing.

    Whether to include the per-agent utilities while computing the best agent action (see our paper for details)

    Whether to use the per-node UCB style bonus while computing the best agent action (see our paper for details)

    Whether to use the per-edge UCB style bonus after the message passing rounds (see our paper for details). One of this or node_exploration MUST be true for exploration.

    Maps each joint state in the tree to the per-state statistics.

Tracks the specific informations and statistics we need to use Var-El to coordinate_action the joint action in Factored-Value MCTS.

Fields: coordgraphcomponents::Vector{Vector{Int64}} The list of coordination graph components, i.e., cliques, where each element is a list of agent IDs that are in a mutual clique.

    Ordering of agent IDs in increasing CG degree. This ordering is the heuristic most typically used for the elimination order in Var-El.

    Maps each joint state in the tree (for which we need to compute the UCB action) to the frequency of each component's various local actions.

    Maps each joint state in the tree to the accumulated q-value of each component's various local actions.

Runs Max-Plus at the current state using the per-state MaxPlusStatistics to compute the best joint action with either or both of node-wise and edge-wise exploration bonus. Rounds of message passing are followed by per-node maximization.


Runs variable elimination at the current state using the VarEl Statistics to compute the best joint action with the component-wise exploration bonus. FYI: Rather complicated.


Called internally in solve() to create the FVMCTSPlanner where Max-Plus is the specific action coordination strategy. Creates MaxPlusStatistics and assumes the various MP flags are sent down from the CoordinationStrategy object given to the solver.


Take the q-value from the MCTS step and distribute the updates across the per-node and per-edge q-stats as per the formula in our paper.


Take the q-value from the MCTS step and distribute the updates across the component q-stats as per the formula in the Amato-Oliehoek paper.


Called internally in solve() to create the FVMCTSPlanner where Var-El is the specific action coordination strategy. Creates VarElStatistics internally with the CG components and the minimum degree ordering heuristic.