DeepQLearning.batch_trajectories
— Methodbatch_trajectories(s::AbstractArray, traj_length::Int64, batch_size::Int64)
converts multidimensional arrays into batches of trajectories to be process by a Flux recurrent model. It takes as input an array of dimension statedim... x trajlength x batch_size
DeepQLearning.evaluation
— Functionevaluation(eval_policy, policy, env, obs, global_step, rng)
returns the average reward of the current policy, the user can specify its own function
f to carry the evaluation, we provide a default basic_evaluation that is just a rollout.
DeepQLearning.exploration
— Methodexploration(exp_policy, policy, env, obs, global_step, rng)
return an action following an exploration policy
the use can provide its own exp_policy function
DeepQLearning.flattenbatch
— Methodflattenbatch(x::AbstractArray)
flatten a multi dimensional array to keep only the last dimension. It returns a 2 dimensional array of size (flattendim, batchsize)
DeepQLearning.getnetwork
— Functiongetnetwork(policy)
return the value network of the policy
DeepQLearning.globalnorm
— Methodglobalnorm(p::Params, gs::Flux.Zygote.Grads)
returns the maximum absolute values in the gradients of W
hiddenstates(m)
returns the hidden states of all the recurrent layers of a model
DeepQLearning.huber_loss
— Methodhuber_loss(x)
Compute the Huber Loss (from ReinforcementLearning.jl)
DeepQLearning.isrecurrent
— Methodisrecurrent(m)
returns true if m contains a recurrent layer
DeepQLearning.resetstate!
— Functionresetstate!(policy)
reset the hidden states of a policy
sethiddenstates!(m, hs)
Given a list of hiddenstate, set the hidden state of each recurrent layer of the model m to what is in the list. The order of the list should match the order of the recurrent layers in the model.