Local Gradient-Based Optimization
Recommended Methods
ADAM()
is a good default with decent convergence rate. BFGS()
can converge faster but is more prone to hitting bad local optima. LBFGS()
requires less memory than BFGS
and thus can have better scaling.
Flux.jl
Flux.Optimise.Descent
: Classic gradient descent optimizer with learning ratesolve(problem, Descent(η))
η
is the learning rate- defaults to:
η = 0.1
Flux.Optimise.Momentum
: Classic gradient descent optimizer with learning rate and momentumsolve(problem, Momentum(η, ρ))
η
is the learning rateρ
is the momentum- defaults to:
η = 0.01, ρ = 0.9
Flux.Optimise.Nesterov
: Gradient descent optimizer with learning rate and Nesterov momentumsolve(problem, Nesterov(η, ρ))
η
is the learning rateρ
is the Nesterov momentum- defaults to:
η = 0.01, ρ = 0.9
Flux.Optimise.RMSProp
: RMSProp optimizersolve(problem, RMSProp(η, ρ))
η
is the learning rateρ
is the momentum- defaults to:
η = 0.001, ρ = 0.9
Flux.Optimise.ADAM
: ADAM optimizersolve(problem, ADAM(η, β::Tuple))
η
is the learning rateβ::Tuple
is the decay of momentums- defaults to:
η = 0.001, β::Tuple = (0.9, 0.999)
Flux.Optimise.RADAM
: Rectified ADAM optimizersolve(problem, RADAM(η, β::Tuple))
η
is the learning rateβ::Tuple
is the decay of momentums- defaults to:
η = 0.001, β::Tuple = (0.9, 0.999)
Flux.Optimise.AdaMax
: AdaMax optimizersolve(problem, AdaMax(η, β::Tuple))
η
is the learning rateβ::Tuple
is the decay of momentums- defaults to:
η = 0.001, β::Tuple = (0.9, 0.999)
Flux.Optimise.ADAGRad
: ADAGrad optimizersolve(problem, ADAGrad(η))
η
is the learning rate- defaults to:
η = 0.1
Flux.Optimise.ADADelta
: ADADelta optimizersolve(problem, ADADelta(ρ))
ρ
is the gradient decay factor- defaults to:
ρ = 0.9
Flux.Optimise.AMSGrad
: AMSGrad optimizersolve(problem, AMSGrad(η, β::Tuple))
η
is the learning rateβ::Tuple
is the decay of momentums- defaults to:
η = 0.001, β::Tuple = (0.9, 0.999)
Flux.Optimise.NADAM
: Nesterov variant of the ADAM optimizersolve(problem, NADAM(η, β::Tuple))
η
is the learning rateβ::Tuple
is the decay of momentums- defaults to:
η = 0.001, β::Tuple = (0.9, 0.999)
Flux.Optimise.ADAMW
: ADAMW optimizersolve(problem, ADAMW(η, β::Tuple))
η
is the learning rateβ::Tuple
is the decay of momentumsdecay
is the decay to weights- defaults to:
η = 0.001, β::Tuple = (0.9, 0.999), decay = 0
Optim.jl
Optim.ConjugateGradient
: Conjugate Gradient Descentsolve(problem, ConjugateGradient(alphaguess, linesearch, eta, P, precondprep))
alphaguess
computes the initial step length (for more information, consult this source and this example)- available initial step length procedures:
InitialPrevious
InitialStatic
InitialHagerZhang
InitialQuadratic
InitialConstantChange
linesearch
specifies the line search algorithm (for more information, consult this source and this example)- available line search algorithms:
HaegerZhang
MoreThuente
BackTracking
StrongWolfe
Static
eta
determines the next step directionP
is an optional preconditioner (for more information, see this source)precondpred
is used to updateP
as the state variablex
changes- defaults to:
julia alphaguess = LineSearches.InitialHagerZhang(), linesearch = LineSearches.HagerZhang(), eta = 0.4, P = nothing, precondprep = (P, x) -> nothing
Optim.GradientDescent
: Gradient Descent (a quasi-Newton solver)solve(problem, GradientDescent(alphaguess, linesearch, P, precondprep))
alphaguess
computes the initial step length (for more information, consult this source and this example)- available initial step length procedures:
InitialPrevious
InitialStatic
InitialHagerZhang
InitialQuadratic
InitialConstantChange
linesearch
specifies the line search algorithm (for more information, consult this source and this example)- available line search algorithms:
HaegerZhang
MoreThuente
BackTracking
StrongWolfe
Static
P
is an optional preconditioner (for more information, see this source)precondpred
is used to updateP
as the state variablex
changes- defaults to:
julia alphaguess = LineSearches.InitialPrevious(), linesearch = LineSearches.HagerZhang(), P = nothing, precondprep = (P, x) -> nothing
Optim.BFGS
: Broyden-Fletcher-Goldfarb-Shanno algorithmsolve(problem, BFGS(alpaguess, linesearch, initial_invH, initial_stepnorm, manifold))
alphaguess
computes the initial step length (for more information, consult this source and this example)- available initial step length procedures:
InitialPrevious
InitialStatic
InitialHagerZhang
InitialQuadratic
InitialConstantChange
linesearch
specifies the line search algorithm (for more information, consult this source and this example)- available line search algorithms:
HaegerZhang
MoreThuente
BackTracking
StrongWolfe
Static
initial_invH
specifies an optional initial matrixinitial_stepnorm
determines thatinitial_invH
is an identity matrix scaled by the value ofinitial_stepnorm
multiplied by the sup-norm of the gradient at the initial pointmanifold
specifies a (Riemannian) manifold on which the function is to be minimized (for more information, consult this source)- available manifolds:
Flat
Sphere
Stiefel
- meta-manifolds:
PowerManifold
ProductManifold
- custom manifolds
- defaults to:
alphaguess = LineSearches.InitialStatic()
,linesearch = LineSearches.HagerZhang()
,initial_invH = nothing
,initial_stepnorm = nothing
,manifold = Flat()
Optim.LBFGS
: Limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithm