batch_trajectories(s::AbstractArray, traj_length::Int64, batch_size::Int64)

converts multidimensional arrays into batches of trajectories to be process by a Flux recurrent model. It takes as input an array of dimension statedim... x trajlength x batch_size

evaluation(eval_policy, policy, env, obs, global_step, rng)
returns the average reward of the current policy, the user can specify its own function 
f to carry the evaluation, we provide a default basic_evaluation that is just a rollout.
exploration(exp_policy, policy, env, obs, global_step, rng)
return an action following an exploration policy 
the use can provide its own exp_policy function

flatten a multi dimensional array to keep only the last dimension. It returns a 2 dimensional array of size (flattendim, batchsize)

globalnorm(p::Params, gs::Flux.Zygote.Grads)

returns the maximum absolute values in the gradients of W

sethiddenstates!(m, hs)

Given a list of hiddenstate, set the hidden state of each recurrent layer of the model m to what is in the list. The order of the list should match the order of the recurrent layers in the model.