(a::ADAMOptimizer{T,N})(Θ::Array{T,N}, g::Array{T,N}, t::Number)

Applies the gradient g to the parameters Θ (mutating) at iteration t ADAM GD update http://sebastianruder.com/optimizing-gradient-descent/index.html#adam



T::Int          # number of time steps
n::Int          # State dimension
m::Int          # Number of control inputs
K::Array{P,3}   # Time-varying feedback gain ∈ R(n,m,T)
k::Array{P,2}   # Open loop control signal  ∈ R(m,T)
Σ::Array{P,3}   # Time-varying controller covariance  ∈ R(m,m,T)
Σi::Array{P,3}  # The inverses of Σ

Minimize 0.5*x'*H*x + x'*g s.t. lower<=x<=upper

inputs: H - positive definite matrix (n * n) g - bias vector (n) lower - lower bounds (n) upper - upper bounds (n)

optional inputs: x0 - initial state (n) options - see below (7)

outputs: x - solution (n) result - result type (roughly, higher is better, see below) Hfree - subspace cholesky factor (nfree * nfree) free - set of free dimensions (n)


Run the iLQG Function to find an optimal trajectory For the "pendulum on a cart system". Requires package ControlSystems.jl


x0 = [π-0.6,0,0,0] goal = [π,0,0,0] Q = Diagonal([10,1,2,1]) : State weight matrix R = 1 : Control weight matrix lims = 5.0*[-1 1] : control limits, T = 600 : Number of time steps doplot = true : Plot results

xnew,unew,cnew,sigmanew = forward_pass(traj_new, x0,u,x,α,f,costfun,lims,diff)


  • α: step size (αk is applied to old trajectory)
  • diff: function to determine difference diff(xnew[:,i], x[:,i])
  • f: forward dynamics x(k+1) = f(x(k), u(k), k)
  • cnew = costfun(xnew, unew)

iLQG - solve the deterministic finite-horizon optimal control problem.

minimize sum_i cost(x[:,i],u[:,i]) + cost(x[:,end]) s.t. x[:,i+1] = f(x[:,i],u[:,i])


f, costfun, df

  1. step:

xnew = f(x,u,i) is called during the forward pass. Here the state x and control u are vectors: size(x)==(n,), size(u)==(m,). The time index i is a scalar.

  1. cost:

cost = costfun(x,u) is called in the forward pass to compute the cost per time-step along the trajectory x,u.

  1. derivatives:

fx,fu,fxx,fxu,fuu,cx,cu,cxx,cxu,cuu = df(x,u) computes the derivatives along a trajectory. In this case size(x)==(n, N) where N is the trajectory length. size(u)==(m, N). The time indexes are I=(1:N). Dimensions match the variable names e.g. size(fxu)==(n, n, m, N) If cost function or system is time invariant, the dimension of the corresponding derivatives can be reduced by dropping the time dimension

x0 - The initial state from which to solve the control problem. Should be a column vector. If a pre-rolled trajectory is available then size(x0)==(n, N) can be provided and cost set accordingly.

u0 - The initial control sequence. A matrix of size(u0)==(m, N) where m is the dimension of the control and N is the number of state transitions.


x - the optimal state trajectory found by the algorithm. size(x)==(n, N)

u - the optimal open-loop control sequence. size(u)==(m, N)

traj_new - A new GaussianPolicy object containing feedforward control trajectory and feedback-gains, these gains multiply the deviation of a simulated trajectory from the nominal trajectory x. See ?GaussianPolicy for more help.

Vx - the gradient of the cost-to-go. size(Vx)==(n, N)

Vxx - the Hessian of the cost-to-go. size(Vxx)==(n, n N)

cost - the costs along the trajectory. size(cost)==(1, N) the cost-to-go is V = fliplr(cumsum(fliplr(cost)))

trace - a trace of various convergence-related values. One row for each iteration, the columns of trace are [iter λ α g_norm Δcost z sum(cost) dλ] see below for details.

Keyword arguments

lims, [], control limits

α, logspace(0,-3,11), backtracking coefficients

tol_fun, 1e-7, reduction exit criterion

tol_grad, 1e-4, gradient exit criterion

max_iter, 500, maximum iterations

λ, 1, initial value for λ

, 1, initial value for dλ

λfactor, 1.6, λ scaling factor

λmax, 1e10, λ maximum value

λmin, 1e-6, below this value λ = 0

regType, 1, regularization type 1: quu+λ*I 2: Vxx+λ*I

reduce_ratio_min, 0, minimal accepted reduction ratio

diff_fun, -, user-defined diff for sub-space optimization

plot, 1, 0: no k>0: every k iters k<0: every k iters, with derivs window

verbosity, 2, 0: no 1: final 2: iter 3: iter, detailed

plot_fun, x->0, user-defined graphics callback

cost, [], initial cost for pre-rolled trajectory

This code consists of a port and extension of a MATLAB library provided by the autors of INPROCEEDINGS{author={Tassa, Y. and Mansard, N. and Todorov, E.}, booktitle={Robotics and Automation (ICRA), 2014 IEEE International Conference on}, title={Control-Limited Differential Dynamic Programming}, year={2014}, month={May}, doi={10.1109/ICRA.2014.6907001}}

`x, u, traj_new, Vx, Vxx, cost, trace = iLQGkl(dynamics,costfun,derivs, x0, traj_prev, model;
    constrain_per_step = false,
    kl_step            = 0,
    lims               = [],                    # Control signal limits ::Matrix ∈ R(m,2)
    tol_fun            = 1e-7,
    tol_grad           = 1e-4,
    max_iter           = 50,
    print_head         = 10,                    # Print headers this often
    print_period       = 1,                     # Print this often
    reduce_ratio_min   = 0,                     # Not used ATM
    diff_fun           = -,
    verbosity          = 2,                     # ∈ (0,3)
    plot_fun           = x->0,                  # Not used
    cost               = [],                    # Supply if pre-rolled trajectory supplied
    ηbracket           = [1e-8,1,1e16],         # dual variable bracket [min_η, η, max_η]
    del0               = 0.0001,                # Start of dual variable increase
    gd_alpha           = 0.01                   # Step size in GD (ADAMOptimizer) when constrain_per_step is true

Solves the iLQG problem with constraints on control signals lims and bound on the KL-divergence kl_step from the old trajectory distribution traj_prev::GaussianPolicy.

To solve the maximum entropy problem, use controller controller(xi,i) = u[:,i] + K[:,:,i]*(xi-x[:,i]) + chol(Σ)*randn(m) where K comes from traj_new. Note that multiplying the cost by a constant changes the relative weight between the cost term and the entropy term, i.e., higher cost produces less noise through chol(Σ) since (Σ = Qᵤᵤ⁻¹).


Calculate the Q terms related to the KL-constraint. (Actually, only related to log(p̂(τ)) since the constraint is rewritten as Entropy term and other term dissapears into expectation under p(τ).) Qtt is [Qxx Qxu; Qux Quu] Qt is [Qx; Qu] These terms should be added to the Q terms calculated in the backwards pass to produce the final Q terms. This Function should be called from within the backwards_pass Function or just prior to it to adjust the cost derivative matrices.