Algorithms

This section contains all main algorithms of the package. These are the ones typical users will call.

The typical signature for these algorithms is:

my_algorithm(f, grad!, lmo, x0)

Standard algorithms

FrankWolfe.frank_wolfe — Method

frank_wolfe(f, grad!, lmo, x0; ...)

Simplest form of the Frank-Wolfe algorithm. Returns a tuple (x, v, primal, dual_gap, traj_data) with:

x final iterate
v last vertex from the LMO
primal primal value f(x)
dual_gap final Frank-Wolfe gap
traj_data vector of trajectory information.

FrankWolfe.lazified_conditional_gradient — Method

lazified_conditional_gradient(f, grad!, lmo_base, x0; ...)

Similar to FrankWolfe.frank_wolfe but lazyfying the LMO: each call is stored in a cache, which is looked up first for a good-enough direction. The cache used is a FrankWolfe.MultiCacheLMO or a FrankWolfe.VectorCacheLMO depending on whether the provided cache_size option is finite.

FrankWolfe.stochastic_frank_wolfe — Method

stochastic_frank_wolfe(f::StochasticObjective, lmo, x0; ...)

Stochastic version of Frank-Wolfe, evaluates the objective and gradient stochastically, implemented through the FrankWolfe.StochasticObjective interface.

Keyword arguments include batch_size to pass a fixed batch_size or a batch_iterator implementing batch_size = FrankWolfe.batchsize_iterate(batch_iterator) for algorithms like Variance-reduced and projection-free stochastic optimization, E Hazan, H Luo, 2016.

Similarly, a constant momentum can be passed or replaced by a momentum_iterator implementing momentum = FrankWolfe.momentum_iterate(momentum_iterator).

FrankWolfe.block_coordinate_frank_wolfe — Function

block_coordinate_frank_wolfe(f, grad!, lmo::ProductLMO{N}, x0; ...) where {N}

Block-coordinate version of the Frank-Wolfe algorithm. Minimizes objective f over the product of feasible domains specified by the lmo. The optional argument the update_order is of type FrankWolfe.BlockCoordinateUpdateOrder and controls the order in which the blocks are updated. The argument update_step is a single instance or tuple of FrankWolfe.UpdateStep and defines which FW-algorithms to use to update the iterates in the different blocks.

The method returns a tuple (x, v, primal, dual_gap, traj_data) with:

x cartesian product of final iterates
v cartesian product of last vertices of the LMOs
primal primal value f(x)
dual_gap final Frank-Wolfe gap
traj_data vector of trajectory information.

See S. Lacoste-Julien, M. Jaggi, M. Schmidt, and P. Pletscher 2013 and A. Beck, E. Pauwels and S. Sabach 2015 for more details about Block-Coordinate Frank-Wolfe.

Active-set based methods

The following algorithms maintain the representation of the iterates as a convex combination of vertices.

Away-step

FrankWolfe.away_frank_wolfe — Method

away_frank_wolfe(f, grad!, lmo, x0; ...)

Frank-Wolfe with away steps. The algorithm maintains the current iterate as a convex combination of vertices in the FrankWolfe.ActiveSet data structure. See M. Besançon, A. Carderera and S. Pokutta 2021 for illustrations of away steps.

Pairwise Frank-Wolfe

FrankWolfe.blended_pairwise_conditional_gradient — Method

blended_pairwise_conditional_gradient(f, grad!, lmo, x0; kwargs...)

Implements the BPCG algorithm from Tsuji, Tanaka, Pokutta (2021). The method uses an active set of current vertices. Unlike away-step, it transfers weight from an away vertex to another vertex of the active set.

FrankWolfe.blended_pairwise_conditional_gradient — Method

blended_pairwise_conditional_gradient(f, grad!, lmo, active_set::AbstractActiveSet; kwargs...)

Warm-starts BPCG with a pre-defined active_set.

FrankWolfe.pairwise_frank_wolfe — Method

pairwise_frank_wolfe(f, grad!, lmo, x0; ...)

Frank-Wolfe with pairwise steps. The algorithm maintains the current iterate as a convex combination of vertices in the FrankWolfe.ActiveSet data structure. See M. Besançon, A. Carderera and S. Pokutta 2021 for illustrations of away steps. Unlike away-step, it transfers weight from an away vertex to another vertex.

Blended Conditional Gradient

FrankWolfe.accelerated_simplex_gradient_descent_over_probability_simplex — Method

accelerated_simplex_gradient_descent_over_probability_simplex

Minimizes an objective function over the unit probability simplex until the Strong-Wolfe gap is below tolerance using Nesterov's accelerated gradient descent.

FrankWolfe.blended_conditional_gradient — Method

blended_conditional_gradient(f, grad!, lmo, x0)

Entry point for the Blended Conditional Gradient algorithm. See Braun, Gábor, et al. "Blended conditonal gradients" ICML 2019. The method works on an active set like FrankWolfe.away_frank_wolfe, performing gradient descent over the convex hull of active vertices, removing vertices when their weight drops to 0 and adding new vertices by calling the linear oracle in a lazy fashion.

FrankWolfe.build_reduced_problem — Method

build_reduced_problem(atoms::AbstractVector{<:AbstractVector}, hessian, weights, gradient, tolerance)

Given an active set formed by vectors , a (constant) Hessian and a gradient constructs a quadratic problem over the unit probability simplex that is equivalent to minimizing the original function over the convex hull of the active set. If λ are the barycentric coordinates of dimension equal to the cardinality of the active set, the objective function is:

f(λ) = reduced_linear^T λ + 0.5 * λ^T reduced_hessian λ

In the case where we find that the current iterate has a strong-Wolfe gap over the convex hull of the active set that is below the tolerance we return nothing (as there is nothing to do).

FrankWolfe.lp_separation_oracle — Method

Returns either a tuple (y, val) with y an atom from the active set satisfying the progress criterion and val the corresponding gap dot(y, direction) or the same tuple with y from the LMO.

inplace_loop controls whether the iterate type allows in-place writes. kwargs are passed on to the LMO oracle.

FrankWolfe.minimize_over_convex_hull! — Method

minimize_over_convex_hull!

Given a function f with gradient grad! and an active set active_set this function will minimize the function over the convex hull of the active set until the strong-wolfe gap over the active set is below tolerance.

It will either directly minimize over the convex hull using simplex gradient descent, or it will transform the problem to barycentric coordinates and minimize over the unit probability simplex using gradient descent or Nesterov's accelerated gradient descent.

FrankWolfe.projection_simplex_sort — Method

projection_simplex_sort(x; s=1.0)

Perform a projection onto the probability simplex of radius s using a sorting algorithm.

FrankWolfe.simplex_gradient_descent_over_convex_hull — Method

simplex_gradient_descent_over_convex_hull(f, grad!, gradient, active_set, tolerance, t, time_start, non_simplex_iter)

Minimizes an objective function over the convex hull of the active set until the Strong-Wolfe gap is below tolerance using simplex gradient descent.

FrankWolfe.simplex_gradient_descent_over_probability_simplex — Method

simplex_gradient_descent_over_probability_simplex

Minimizes an objective function over the unit probability simplex until the Strong-Wolfe gap is below tolerance using gradient descent.

FrankWolfe.strong_frankwolfe_gap — Method

Checks the strong Frank-Wolfe gap for the reduced problem.

FrankWolfe.strong_frankwolfe_gap_probability_simplex — Method

strong_frankwolfe_gap_probability_simplex

Compute the Strong-Wolfe gap over the unit probability simplex given a gradient.

Blended Pairwise Conditional Gradient

Alternating Methods

Problems over intersections of convex sets, i.e.

\[\min_{x \in \bigcap_{i=1}^n P_i} f(x),\]

pose a challenge as one has to combine the information of two or more LMOs.

FrankWolfe.alternating_linear_minimization converts the problem into a series of subproblems over single sets. To find a point within the intersection, one minimizes both the distance to the iterates of the other subproblems and the original objective function.

FrankWolfe.alternating_projections solves feasibility problems over intersections of feasible regions.

FrankWolfe.alternating_linear_minimization — Method

alternating_linear_minimization(bc_algo::BlockCoordinateMethod, f, grad!, lmos::NTuple{N,LinearMinimizationOracle}, x0; ...) where {N}

Alternating Linear Minimization minimizes the objective f over the intersections of the feasible domains specified by lmos. The tuple x0 defines the initial points for each domain. Returns a tuple (x, v, primal, dual_gap, infeas, traj_data) with:

x cartesian product of final iterates
v cartesian product of last vertices of the LMOs
primal primal value f(x)
dual_gap final Frank-Wolfe gap
infeas sum of squared, pairwise distances between iterates
traj_data vector of trajectory information.

FrankWolfe.alternating_projections — Method

alternating_projections(lmos::NTuple{N,LinearMinimizationOracle}, x0; ...) where {N}

Computes a point in the intersection of feasible domains specified by lmos. Returns a tuple (x, v, dual_gap, infeas, traj_data) with:

x cartesian product of final iterates
v cartesian product of last vertices of the LMOs
dual_gap final Frank-Wolfe gap
infeas sum of squared, pairwise distances between iterates
traj_data vector of trajectory information.