ValueIterationPolicy <: Policy

The policy type. Contains the Q-Matrix, the Utility function and an array of indices corresponding to optimal actions. There are three ways to initialize the policy type:

`policy = ValueIterationPolicy(mdp)` 
`policy = ValueIterationPolicy(mdp, utility_array)`
`policy = ValueIterationPolicy(mdp, qmatrix)`

The Q-matrix is nxm, where n is the number of states and m is the number of actions.


  • qmat Q matrix storing Q(s,a) values
  • util The value function V(s)
  • policy Policy array, maps state index to action index
  • action_map Maps the action index to the concrete action type
  • include_Q Flag for including the Q-matrix
  • mdp uses the model for indexing in the action function
ValueIterationSolver <: Solver

The solver type. Contains the following parameters that can be passed as keyword arguments to the constructor

- max_iterations::Int64, the maximum number of iterations value iteration runs for (default 100)
- belres::Float64, the Bellman residual (default 1e-3)
- verbose::Bool, if set to true, the bellman residual and the time per iteration will be printed to STDOUT (default false)
- include_Q::Bool, if set to true, the solver outputs the Q values in addition to the utility and the policy (default true)
- init_util::Vector{Float64}, provides a custom initialization of the utility vector. (initializes utility to 0 by default)