DifferentialDynamicProgramming.ADAMOptimizer
— Method(a::ADAMOptimizer{T,N})(Θ::Array{T,N}, g::Array{T,N}, t::Number)
Applies the gradient g
to the parameters Θ
(mutating) at iteration t
ADAM GD update http://sebastianruder.com/optimizing-gradient-descent/index.html#adam
DifferentialDynamicProgramming.GaussianPolicy
— Type`GaussianPolicy{P}`
Fileds:
T::Int # number of time steps
n::Int # State dimension
m::Int # Number of control inputs
K::Array{P,3} # Time-varying feedback gain ∈ R(n,m,T)
k::Array{P,2} # Open loop control signal ∈ R(m,T)
Σ::Array{P,3} # Time-varying controller covariance ∈ R(m,m,T)
Σi::Array{P,3} # The inverses of Σ
DifferentialDynamicProgramming.KLmv
— MethodThis is the inverse of Σₓᵤ
DifferentialDynamicProgramming.boxQP
— MethodMinimize 0.5*x'*H*x + x'*g
s.t. lower<=x<=upper
inputs: H
- positive definite matrix (n * n) g
- bias vector (n) lower
- lower bounds (n) upper
- upper bounds (n)
optional inputs: x0
- initial state (n) options
- see below (7)
outputs: x
- solution (n) result
- result type (roughly, higher is better, see below) Hfree
- subspace cholesky factor (nfree * nfree) free
- set of free dimensions (n)
DifferentialDynamicProgramming.calc_η
— Methodnewη, satisfied, divergence = calcη(xnew,xold,sigmanew,η, trajnew, trajprev, kl_step) This Function caluculates the step size
DifferentialDynamicProgramming.demo_pendcart
— Methoddemo_pendcart(;kwargs...)
Run the iLQG Function to find an optimal trajectory For the "pendulum on a cart system". Requires package ControlSystems.jl
Arguments
x0 = [π-0.6,0,0,0]
goal = [π,0,0,0]
Q = Diagonal([10,1,2,1])
: State weight matrix R = 1
: Control weight matrix lims = 5.0*[-1 1]
: control limits, T = 600
: Number of time steps doplot = true
: Plot results
DifferentialDynamicProgramming.forward_pass
— Methodxnew,unew,cnew,sigmanew = forward_pass(traj_new, x0,u,x,α,f,costfun,lims,diff)
Arguments
- α: step size (αk is applied to old trajectory)
- diff: function to determine difference
diff(xnew[:,i], x[:,i])
- f: forward dynamics
x(k+1) = f(x(k), u(k), k)
cnew = costfun(xnew, unew)
DifferentialDynamicProgramming.iLQG
— MethodiLQG - solve the deterministic finite-horizon optimal control problem.
minimize sum_i cost(x[:,i],u[:,i]) + cost(x[:,end]) s.t. x[:,i+1] = f(x[:,i],u[:,i])
Inputs
f, costfun, df
- step:
xnew = f(x,u,i)
is called during the forward pass. Here the state x and control u are vectors: size(x)==(n,), size(u)==(m,). The time index i
is a scalar.
- cost:
cost = costfun(x,u)
is called in the forward pass to compute the cost per time-step along the trajectory x,u
.
- derivatives:
fx,fu,fxx,fxu,fuu,cx,cu,cxx,cxu,cuu = df(x,u)
computes the derivatives along a trajectory. In this case size(x)==(n, N) where N is the trajectory length. size(u)==(m, N). The time indexes are I=(1:N). Dimensions match the variable names e.g. size(fxu)==(n, n, m, N) If cost function or system is time invariant, the dimension of the corresponding derivatives can be reduced by dropping the time dimension
x0
- The initial state from which to solve the control problem. Should be a column vector. If a pre-rolled trajectory is available then size(x0)==(n, N) can be provided and cost set accordingly.
u0
- The initial control sequence. A matrix of size(u0)==(m, N) where m is the dimension of the control and N is the number of state transitions.
Outputs
x
- the optimal state trajectory found by the algorithm. size(x)==(n, N)
u
- the optimal open-loop control sequence. size(u)==(m, N)
traj_new
- A new GaussianPolicy
object containing feedforward control trajectory and feedback-gains, these gains multiply the deviation of a simulated trajectory from the nominal trajectory x. See ?GaussianPolicy
for more help.
Vx
- the gradient of the cost-to-go. size(Vx)==(n, N)
Vxx
- the Hessian of the cost-to-go. size(Vxx)==(n, n N)
cost
- the costs along the trajectory. size(cost)==(1, N) the cost-to-go is V = fliplr(cumsum(fliplr(cost)))
trace
- a trace of various convergence-related values. One row for each iteration, the columns of trace are [iter λ α g_norm Δcost z sum(cost) dλ]
see below for details.
Keyword arguments
lims
, [], control limits
α
, logspace(0,-3,11), backtracking coefficients
tol_fun
, 1e-7, reduction exit criterion
tol_grad
, 1e-4, gradient exit criterion
max_iter
, 500, maximum iterations
λ
, 1, initial value for λ
dλ
, 1, initial value for dλ
λfactor
, 1.6, λ scaling factor
λmax
, 1e10, λ maximum value
λmin
, 1e-6, below this value λ = 0
regType
, 1, regularization type 1: quu+λ*I 2: Vxx+λ*I
reduce_ratio_min
, 0, minimal accepted reduction ratio
diff_fun
, -, user-defined diff for sub-space optimization
plot
, 1, 0: no k>0: every k iters k<0: every k iters, with derivs window
verbosity
, 2, 0: no 1: final 2: iter 3: iter, detailed
plot_fun
, x->0, user-defined graphics callback
cost
, [], initial cost for pre-rolled trajectory
This code consists of a port and extension of a MATLAB library provided by the autors of INPROCEEDINGS{author={Tassa, Y. and Mansard, N. and Todorov, E.}, booktitle={Robotics and Automation (ICRA), 2014 IEEE International Conference on}, title={Control-Limited Differential Dynamic Programming}, year={2014}, month={May}, doi={10.1109/ICRA.2014.6907001}}
DifferentialDynamicProgramming.iLQGkl
— Method`x, u, traj_new, Vx, Vxx, cost, trace = iLQGkl(dynamics,costfun,derivs, x0, traj_prev, model;
constrain_per_step = false,
kl_step = 0,
lims = [], # Control signal limits ::Matrix ∈ R(m,2)
tol_fun = 1e-7,
tol_grad = 1e-4,
max_iter = 50,
print_head = 10, # Print headers this often
print_period = 1, # Print this often
reduce_ratio_min = 0, # Not used ATM
diff_fun = -,
verbosity = 2, # ∈ (0,3)
plot_fun = x->0, # Not used
cost = [], # Supply if pre-rolled trajectory supplied
ηbracket = [1e-8,1,1e16], # dual variable bracket [min_η, η, max_η]
del0 = 0.0001, # Start of dual variable increase
gd_alpha = 0.01 # Step size in GD (ADAMOptimizer) when constrain_per_step is true
)`
Solves the iLQG problem with constraints on control signals lims
and bound on the KL-divergence kl_step
from the old trajectory distribution traj_prev::GaussianPolicy
.
To solve the maximum entropy problem, use controller controller(xi,i) = u[:,i] + K[:,:,i]*(xi-x[:,i]) + chol(Σ)*randn(m)
where K
comes from traj_new
. Note that multiplying the cost by a constant changes the relative weight between the cost term and the entropy term, i.e., higher cost produces less noise through chol(Σ) since (Σ = Qᵤᵤ⁻¹).
DifferentialDynamicProgramming.kl_div
— MethodThis function produces lots of negative values which are clipped by the max(0,kl)
DifferentialDynamicProgramming.kl_div_wiki
— MethodThis version seems to be symmetric and positive
DifferentialDynamicProgramming.∇kl
— MethodCalculate the Q terms related to the KL-constraint. (Actually, only related to log(p̂(τ)) since the constraint is rewritten as Entropy term and other term dissapears into expectation under p(τ).) Qtt is [Qxx Qxu; Qux Quu] Qt is [Qx; Qu] These terms should be added to the Q terms calculated in the backwards pass to produce the final Q terms. This Function should be called from within the backwards_pass Function or just prior to it to adjust the cost derivative matrices.