DifferentialDynamicProgramming.ADAMOptimizerMethod
(a::ADAMOptimizer{T,N})(Θ::Array{T,N}, g::Array{T,N}, t::Number)

Applies the gradient g to the parameters Θ (mutating) at iteration t ADAM GD update http://sebastianruder.com/optimizing-gradient-descent/index.html#adam

DifferentialDynamicProgramming.GaussianPolicyType
GaussianPolicy{P}

Fileds:

T::Int          # number of time steps
n::Int          # State dimension
m::Int          # Number of control inputs
K::Array{P,3}   # Time-varying feedback gain ∈ R(n,m,T)
k::Array{P,2}   # Open loop control signal  ∈ R(m,T)
Σ::Array{P,3}   # Time-varying controller covariance  ∈ R(m,m,T)
Σi::Array{P,3}  # The inverses of Σ
DifferentialDynamicProgramming.boxQPMethod

Minimize 0.5*x'*H*x + x'*g s.t. lower<=x<=upper

inputs: H - positive definite matrix (n * n) g - bias vector (n) lower - lower bounds (n) upper - upper bounds (n)

optional inputs: x0 - initial state (n) options - see below (7)

outputs: x - solution (n) result - result type (roughly, higher is better, see below) Hfree - subspace cholesky factor (nfree * nfree) free - set of free dimensions (n)

DifferentialDynamicProgramming.demo_pendcartMethod
demo_pendcart(;kwargs...)

Run the iLQG Function to find an optimal trajectory For the "pendulum on a cart system". Requires package ControlSystems.jl

Arguments

x0 = [π-0.6,0,0,0] goal = [π,0,0,0] Q = Diagonal([10,1,2,1]) : State weight matrix R = 1 : Control weight matrix lims = 5.0*[-1 1] : control limits, T = 600 : Number of time steps doplot = true : Plot results

DifferentialDynamicProgramming.forward_passMethod
xnew,unew,cnew,sigmanew = forward_pass(traj_new, x0,u,x,α,f,costfun,lims,diff)

Arguments

• α: step size (αk is applied to old trajectory)
• diff: function to determine difference diff(xnew[:,i], x[:,i])
• f: forward dynamics x(k+1) = f(x(k), u(k), k)
• cnew = costfun(xnew, unew)
DifferentialDynamicProgramming.iLQGMethod

iLQG - solve the deterministic finite-horizon optimal control problem.

minimize sum_i cost(x[:,i],u[:,i]) + cost(x[:,end]) s.t. x[:,i+1] = f(x[:,i],u[:,i])

Inputs

f, costfun, df

1. step:

xnew = f(x,u,i) is called during the forward pass. Here the state x and control u are vectors: size(x)==(n,), size(u)==(m,). The time index i is a scalar.

1. cost:

cost = costfun(x,u) is called in the forward pass to compute the cost per time-step along the trajectory x,u.

1. derivatives:

fx,fu,fxx,fxu,fuu,cx,cu,cxx,cxu,cuu = df(x,u) computes the derivatives along a trajectory. In this case size(x)==(n, N) where N is the trajectory length. size(u)==(m, N). The time indexes are I=(1:N). Dimensions match the variable names e.g. size(fxu)==(n, n, m, N) If cost function or system is time invariant, the dimension of the corresponding derivatives can be reduced by dropping the time dimension

x0 - The initial state from which to solve the control problem. Should be a column vector. If a pre-rolled trajectory is available then size(x0)==(n, N) can be provided and cost set accordingly.

u0 - The initial control sequence. A matrix of size(u0)==(m, N) where m is the dimension of the control and N is the number of state transitions.

Outputs

x - the optimal state trajectory found by the algorithm. size(x)==(n, N)

u - the optimal open-loop control sequence. size(u)==(m, N)

traj_new - A new GaussianPolicy object containing feedforward control trajectory and feedback-gains, these gains multiply the deviation of a simulated trajectory from the nominal trajectory x. See ?GaussianPolicy for more help.

Vx - the gradient of the cost-to-go. size(Vx)==(n, N)

Vxx - the Hessian of the cost-to-go. size(Vxx)==(n, n N)

cost - the costs along the trajectory. size(cost)==(1, N) the cost-to-go is V = fliplr(cumsum(fliplr(cost)))

trace - a trace of various convergence-related values. One row for each iteration, the columns of trace are [iter λ α g_norm Δcost z sum(cost) dλ] see below for details.

Keyword arguments

lims, [], control limits

α, logspace(0,-3,11), backtracking coefficients

tol_fun, 1e-7, reduction exit criterion

tol_grad, 1e-4, gradient exit criterion

max_iter, 500, maximum iterations

λ, 1, initial value for λ

dλ, 1, initial value for dλ

λfactor, 1.6, λ scaling factor

λmax, 1e10, λ maximum value

λmin, 1e-6, below this value λ = 0

regType, 1, regularization type 1: quu+λ*I 2: Vxx+λ*I

reduce_ratio_min, 0, minimal accepted reduction ratio

diff_fun, -, user-defined diff for sub-space optimization

plot, 1, 0: no k>0: every k iters k<0: every k iters, with derivs window

verbosity, 2, 0: no 1: final 2: iter 3: iter, detailed

plot_fun, x->0, user-defined graphics callback

cost, [], initial cost for pre-rolled trajectory

This code consists of a port and extension of a MATLAB library provided by the autors of INPROCEEDINGS{author={Tassa, Y. and Mansard, N. and Todorov, E.}, booktitle={Robotics and Automation (ICRA), 2014 IEEE International Conference on}, title={Control-Limited Differential Dynamic Programming}, year={2014}, month={May}, doi={10.1109/ICRA.2014.6907001}}

DifferentialDynamicProgramming.iLQGklMethod
x, u, traj_new, Vx, Vxx, cost, trace = iLQGkl(dynamics,costfun,derivs, x0, traj_prev, model;
constrain_per_step = false,
kl_step            = 0,
lims               = [],                    # Control signal limits ::Matrix ∈ R(m,2)
tol_fun            = 1e-7,
max_iter           = 50,
print_period       = 1,                     # Print this often
reduce_ratio_min   = 0,                     # Not used ATM
diff_fun           = -,
verbosity          = 2,                     # ∈ (0,3)
plot_fun           = x->0,                  # Not used
cost               = [],                    # Supply if pre-rolled trajectory supplied
ηbracket           = [1e-8,1,1e16],         # dual variable bracket [min_η, η, max_η]
del0               = 0.0001,                # Start of dual variable increase
gd_alpha           = 0.01                   # Step size in GD (ADAMOptimizer) when constrain_per_step is true
)

Solves the iLQG problem with constraints on control signals lims and bound on the KL-divergence kl_step from the old trajectory distribution traj_prev::GaussianPolicy.

To solve the maximum entropy problem, use controller controller(xi,i) = u[:,i] + K[:,:,i]*(xi-x[:,i]) + chol(Σ)*randn(m) where K comes from traj_new. Note that multiplying the cost by a constant changes the relative weight between the cost term and the entropy term, i.e., higher cost produces less noise through chol(Σ) since (Σ = Qᵤᵤ⁻¹).

DifferentialDynamicProgramming.∇klMethod

Calculate the Q terms related to the KL-constraint. (Actually, only related to log(p̂(τ)) since the constraint is rewritten as Entropy term and other term dissapears into expectation under p(τ).) Qtt is [Qxx Qxu; Qux Quu] Qt is [Qx; Qu] These terms should be added to the Q terms calculated in the backwards pass to produce the final Q terms. This Function should be called from within the backwards_pass Function or just prior to it to adjust the cost derivative matrices.