# How does it work?

`FrankWolfe.jl`

contains generic routines to solve optimization problems of the form

\[\min_{x \in \mathcal{C}} f(x)\]

where $\mathcal{C}$ is a compact convex set and $f$ is a differentiable function. These routines work by solving a sequence of linear subproblems:

\[\min_{x \in \mathcal{C}} \langle d_k, x \rangle \quad \text{where} \quad d_k = \nabla f(x_k)\]

## Linear Minimization Oracles

The Linear Minimization Oracle (LMO) is a key component, which is called at each iteration of the FW algorithm. Given a direction $d$, it returns an optimal vertex of the feasible set:

\[v \in \arg \min_{x\in \mathcal{C}} \langle d,x \rangle.\]

### Custom LMOs

To be used by the algorithms provided here, an LMO must be a subtype of `FrankWolfe.LinearMinimizationOracle`

and implement the following method:

`compute_extreme_point(lmo::LMO, direction; kwargs...) -> v`

This method should minimize $v \mapsto \langle d, v \rangle$ over the set $\mathcal{C}$ defined by the LMO. Note that this means the set $\mathcal{C}$ doesn't have to be represented explicitly: all we need is to be able to minimize a linear function over it, even if the minimization procedure is a black box.

### Pre-defined LMOs

If you don't want to define your LMO manually, several common implementations are available out-of-the-box:

- Simplices: unit simplex, probability simplex
- Balls in various norms
- Polytopes: K-sparse, Birkhoff

You can use an oracle defined via a Linear Programming solver (e.g. `SCIP`

or `HiGHS`

) with `MathOptInferface`

: see `FrankWolfe.MathOptLMO`

.

Finally, we provide wrappers to combine oracles easily, for example in a product.

See Combettes, Pokutta (2021) for references on most LMOs implemented in the package and their comparison with projection operators.

## Optimization algorithms

The package features several variants of Frank-Wolfe that share the same basic API.

Most of the algorithms listed below also have a lazified version: see Braun, Pokutta, Zink (2016).

### Standard Frank-Wolfe (FW)

It is implemented in the `frank_wolfe`

function.

See Jaggi (2013) for an overview.

This algorithm works both for convex and non-convex functions (use step size rule `FrankWolfe.Nonconvex()`

in the second case).

### Away-step Frank-Wolfe (AFW)

It is implemented in the `away_frank_wolfe`

function.

See Lacoste-Julien, Jaggi (2015) for an overview.

### Stochastic Frank-Wolfe (SFW)

It is implemented in the `FrankWolfe.stochastic_frank_wolfe`

function.

### Blended Conditional Gradients (BCG)

It is implemented in the `blended_conditional_gradient`

function, with a built-in stability feature that temporarily increases accuracy.

See Braun, Pokutta, Tu, Wright (2018).

### Blended Pairwise Conditional Gradients (BPCG)

It is implemented in the `FrankWolfe.blended_pairwise_conditional_gradient`

function, with a minor modification to improve sparsity.

See Tsuji, Tanaka, Pokutta (2021)

### Comparison

The following table compares the characteristics of the algorithms presented in the package:

Algorithm | Progress/Iteration | Time/Iteration | Sparsity | Numerical Stability | Active Set | Lazifiable |
---|---|---|---|---|---|---|

FW | Low | Low | Low | High | No | Yes |

AFW | Medium | Medium-High | Medium | Medium-High | Yes | Yes |

B(P)CG | High | Medium-High | High | Medium | Yes | By design |

SFW | Low | Low | Low | High | No | No |

While the standard Frank-Wolfe algorithm can only move *towards* extreme points of the compact convex set $\mathcal{C}$, Away-step Frank-Wolfe can move *away* from them. The following figure from our paper illustrates this behaviour:

.

Both algorithms minimize a quadratic function (whose contour lines are depicted) over a simple polytope (the black square). When the minimizer lies on a face, the standard Frank-Wolfe algorithm zig-zags towards the solution, while its Away-step variant converges more quickly.

### Block-Coordinate Frank-Wolfe (BCFW)

It is implemented in the `FrankWolfe.block_coordinate_frank_wolfe`

function.

See Lacoste-Julien, Jaggi, Schmidt, Pletscher (2013) and Beck, Pauwels, Sabach (2015) for more details about different variants of Block-Coordinate Frank-Wolfe.

### Alternating Linear Minimization (ALM)

It is implemented in the `FrankWolfe.alternating_linear_minimization`

function.