`DifferentialDynamicProgramming.ADAMOptimizer`

— Method`(a::ADAMOptimizer{T,N})(Θ::Array{T,N}, g::Array{T,N}, t::Number)`

Applies the gradient `g`

to the parameters `Θ`

(mutating) at iteration `t`

ADAM GD update http://sebastianruder.com/optimizing-gradient-descent/index.html#adam

`DifferentialDynamicProgramming.GaussianPolicy`

— Type``GaussianPolicy{P}``

**Fileds:**

```
T::Int # number of time steps
n::Int # State dimension
m::Int # Number of control inputs
K::Array{P,3} # Time-varying feedback gain ∈ R(n,m,T)
k::Array{P,2} # Open loop control signal ∈ R(m,T)
Σ::Array{P,3} # Time-varying controller covariance ∈ R(m,m,T)
Σi::Array{P,3} # The inverses of Σ
```

`DifferentialDynamicProgramming.KLmv`

— Method`This is the inverse of Σₓᵤ`

`DifferentialDynamicProgramming.boxQP`

— MethodMinimize `0.5*x'*H*x + x'*g`

s.t. lower<=x<=upper

inputs: `H`

- positive definite matrix (n * n) `g`

- bias vector (n) `lower`

- lower bounds (n) `upper`

- upper bounds (n)

optional inputs: `x0`

- initial state (n) `options`

- see below (7)

outputs: `x`

- solution (n) `result`

- result type (roughly, higher is better, see below) `Hfree`

- subspace cholesky factor (n*free * n*free) `free`

- set of free dimensions (n)

`DifferentialDynamicProgramming.calc_η`

— Methodnew*η, satisfied, divergence = calc*η(xnew,xold,sigmanew,η, traj*new, traj*prev, kl_step) This Function caluculates the step size

`DifferentialDynamicProgramming.demo_pendcart`

— Method`demo_pendcart(;kwargs...)`

Run the iLQG Function to find an optimal trajectory For the "pendulum on a cart system". Requires package ControlSystems.jl

**Arguments**

`x0 = [π-0.6,0,0,0]`

`goal = [π,0,0,0]`

`Q = Diagonal([10,1,2,1])`

: State weight matrix `R = 1`

: Control weight matrix `lims = 5.0*[-1 1]`

: control limits, `T = 600`

: Number of time steps `doplot = true`

: Plot results

`DifferentialDynamicProgramming.forward_pass`

— Method`xnew,unew,cnew,sigmanew = forward_pass(traj_new, x0,u,x,α,f,costfun,lims,diff)`

**Arguments**

- α: step size (αk is applied to old trajectory)
- diff: function to determine difference
`diff(xnew[:,i], x[:,i])`

- f: forward dynamics
`x(k+1) = f(x(k), u(k), k)`

`cnew = costfun(xnew, unew)`

`DifferentialDynamicProgramming.iLQG`

— MethodiLQG - solve the deterministic finite-horizon optimal control problem.

minimize sum_i cost(x[:,i],u[:,i]) + cost(x[:,end]) s.t. x[:,i+1] = f(x[:,i],u[:,i])

**Inputs**

`f, costfun, df`

- step:

`xnew = f(x,u,i)`

is called during the forward pass. Here the state x and control u are vectors: size(x)==(n,), size(u)==(m,). The time index `i`

is a scalar.

- cost:

`cost = costfun(x,u)`

is called in the forward pass to compute the cost per time-step along the trajectory `x,u`

.

- derivatives:

`fx,fu,fxx,fxu,fuu,cx,cu,cxx,cxu,cuu = df(x,u)`

computes the derivatives along a trajectory. In this case size(x)==(n, N) where N is the trajectory length. size(u)==(m, N). The time indexes are I=(1:N). Dimensions match the variable names e.g. size(fxu)==(n, n, m, N) If cost function or system is time invariant, the dimension of the corresponding derivatives can be reduced by dropping the time dimension

`x0`

- The initial state from which to solve the control problem. Should be a column vector. If a pre-rolled trajectory is available then size(x0)==(n, N) can be provided and cost set accordingly.

`u0`

- The initial control sequence. A matrix of size(u0)==(m, N) where m is the dimension of the control and N is the number of state transitions.

**Outputs**

`x`

- the optimal state trajectory found by the algorithm. size(x)==(n, N)

`u`

- the optimal open-loop control sequence. size(u)==(m, N)

`traj_new`

- A new `GaussianPolicy`

object containing feedforward control trajectory and feedback-gains, these gains multiply the deviation of a simulated trajectory from the nominal trajectory x. See `?GaussianPolicy`

for more help.

`Vx`

- the gradient of the cost-to-go. size(Vx)==(n, N)

`Vxx`

- the Hessian of the cost-to-go. size(Vxx)==(n, n N)

`cost`

- the costs along the trajectory. size(cost)==(1, N) the cost-to-go is V = fliplr(cumsum(fliplr(cost)))

`trace`

- a trace of various convergence-related values. One row for each iteration, the columns of trace are `[iter λ α g_norm Δcost z sum(cost) dλ]`

see below for details.

**Keyword arguments**

`lims`

, [], control limits

`α`

, logspace(0,-3,11), backtracking coefficients

`tol_fun`

, 1e-7, reduction exit criterion

`tol_grad`

, 1e-4, gradient exit criterion

`max_iter`

, 500, maximum iterations

`λ`

, 1, initial value for λ

`dλ`

, 1, initial value for dλ

`λfactor`

, 1.6, λ scaling factor

`λmax`

, 1e10, λ maximum value

`λmin`

, 1e-6, below this value λ = 0

`regType`

, 1, regularization type 1: q*uu+λ*I 2: V*xx+λ*I

`reduce_ratio_min`

, 0, minimal accepted reduction ratio

`diff_fun`

, -, user-defined diff for sub-space optimization

`plot`

, 1, 0: no k>0: every k iters k<0: every k iters, with derivs window

`verbosity`

, 2, 0: no 1: final 2: iter 3: iter, detailed

`plot_fun`

, x->0, user-defined graphics callback

`cost`

, [], initial cost for pre-rolled trajectory

This code consists of a port and extension of a MATLAB library provided by the autors of `INPROCEEDINGS{author={Tassa, Y. and Mansard, N. and Todorov, E.}, booktitle={Robotics and Automation (ICRA), 2014 IEEE International Conference on}, title={Control-Limited Differential Dynamic Programming}, year={2014}, month={May}, doi={10.1109/ICRA.2014.6907001}}`

`DifferentialDynamicProgramming.iLQGkl`

— Method```
`x, u, traj_new, Vx, Vxx, cost, trace = iLQGkl(dynamics,costfun,derivs, x0, traj_prev, model;
constrain_per_step = false,
kl_step = 0,
lims = [], # Control signal limits ::Matrix ∈ R(m,2)
tol_fun = 1e-7,
tol_grad = 1e-4,
max_iter = 50,
print_head = 10, # Print headers this often
print_period = 1, # Print this often
reduce_ratio_min = 0, # Not used ATM
diff_fun = -,
verbosity = 2, # ∈ (0,3)
plot_fun = x->0, # Not used
cost = [], # Supply if pre-rolled trajectory supplied
ηbracket = [1e-8,1,1e16], # dual variable bracket [min_η, η, max_η]
del0 = 0.0001, # Start of dual variable increase
gd_alpha = 0.01 # Step size in GD (ADAMOptimizer) when constrain_per_step is true
)`
```

Solves the iLQG problem with constraints on control signals `lims`

and bound on the KL-divergence `kl_step`

from the old trajectory distribution `traj_prev::GaussianPolicy`

.

To solve the maximum entropy problem, use controller `controller(xi,i) = u[:,i] + K[:,:,i]*(xi-x[:,i]) + chol(Σ)*randn(m)`

where `K`

comes from `traj_new`

. Note that multiplying the cost by a constant changes the relative weight between the cost term and the entropy term, i.e., higher cost produces less noise through chol(Σ) since (Σ = Qᵤᵤ⁻¹).

`DifferentialDynamicProgramming.kl_div`

— Method`This function produces lots of negative values which are clipped by the max(0,kl)`

`DifferentialDynamicProgramming.kl_div_wiki`

— MethodThis version seems to be symmetric and positive

`DifferentialDynamicProgramming.∇kl`

— MethodCalculate the Q terms related to the KL-constraint. (Actually, only related to log(p̂(τ)) since the constraint is rewritten as Entropy term and other term dissapears into expectation under p(τ).) Qtt is [Qxx Qxu; Qux Quu] Qt is [Qx; Qu] These terms should be added to the Q terms calculated in the backwards pass to produce the final Q terms. This Function should be called from within the backwards_pass Function or just prior to it to adjust the cost derivative matrices.