1 Introduction
1.1 Organisation of the paper
1.2 Notation
2 Problem and general scheme
2.1 The classical Frank–Wolfe method
3 Examples
3.1 Traffic assignment
3.2 Submodular optimization
3.3 LASSO problem
3.4 Matrix completion
3.5 Adversarial attacks in machine learning
3.6 Minimum enclosing ball
3.7 Training linear Support Vector Machines
3.8 Finding maximal cliques in graphs
3.9 Finding sparse points in a set
4 Stepsizes
-
unit stepsize:mainly used when the problem has a concave objective function. Finite convergence can be proved, under suitable assumptions, both for the unit-stepsize FW and some of its variants described in the literature (see, e.g., Rinaldi et al. 2010 for further details).$$\begin{aligned} \alpha _k=1, \end{aligned}$$
-
exact line search:where we pick the smallest minimizer of the function \(\varphi \) for the sake of being well-defined even in rare cases of ties (see, e.g., Bomze et al. 2020; Lacoste-Julien and Jaggi 2015).$$\begin{aligned} \alpha _k = \min {{\mathop {{{\,\mathrm{arg\,min}\,}}}\limits _{\alpha \in [0, \alpha _k^{\max }]}} \varphi (\alpha )} \quad {\text{ with } \varphi (\alpha ):=f({\mathsf {x}}_k + \alpha \, {\mathsf {d}}_k) }\, , \end{aligned}$$(26)
-
Armijo line search: the method iteratively shrinks the step size in order to guarantee a sufficient reduction of the objective function. It represents a good way to replace exact line search in cases when it gets too costly. In practice, we fix parameters \(\delta \in (0,1)\) and \(\gamma \in (0,\frac{1}{2})\), then try steps \(\alpha =\delta ^{m}\alpha _k^{\max }\) with \(m\in \{ 0,1,2,\dots \}\) until the sufficient decrease inequalityholds, and set \(\alpha _k=\alpha \) (see, e.g., Bomze et al. 2019 and references therein).$$\begin{aligned} f({\mathsf {x}}_k+\alpha \,{\mathsf {d}}_k)\le f({\mathsf {x}}_k)+\gamma \alpha \, \nabla f({\mathsf {x}}_k)^{\intercal } {\mathsf {d}}_k \end{aligned}$$(27)
-
Lipschitz constant dependent step size:with L the Lipschitz constant of \(\nabla f\) (see, e.g., Bomze et al. 2020; Pedregosa et al. 2020).$$\begin{aligned} \alpha _k = \alpha _k(L) := \min \left\{ -\, \frac{ \nabla f({\mathsf {x}}_k) ^{\intercal }{\mathsf {d}}_k}{L\Vert {\mathsf {d}}_k \Vert ^2}, \alpha _k^{max} \right\} \, , \end{aligned}$$(28)
5 Properties of the FW method and its variants
5.1 The FW gap
5.2 \({{\mathcal {O}}}(1/k)\) rate for convex objectives
5.3 Variants
Variant | Direction | Active set |
---|---|---|
FW | \({\mathsf {d}}_k = {\mathsf {d}}_k^{FW} = {\mathsf {s}}_k - {\mathsf {x}}_k, \quad {\mathsf {s}}_k \in {{\,\mathrm{arg\,max}\,}}\{\nabla f({\mathsf {x}}_k)^\intercal {\mathsf {x}} : {\mathsf {x}}\in C\} \) | – |
AFW | \( {\mathsf {d}}_k \in {{\,\mathrm{arg\,max}\,}}\{-\nabla f({\mathsf {x}}_k)^\intercal {\mathsf {d}} : {\mathsf {d}}\in \{{\mathsf {x}}_k - {\mathsf {v}}_k, {\mathsf {d}}_k^{FW}\}, \ {\mathsf {v}}_k \in A_k \}\) | \(A_{k + 1}\subseteq A_k \cup \{{\mathsf {s}}_{k}\}\) |
PFW | \( {\mathsf {d}}_k = {\mathsf {s}}_k - {\mathsf {v}}_k, \quad {\mathsf {v}}_k \in {{\,\mathrm{arg\,max}\,}}\{\nabla f({\mathsf {x}}_k)^\intercal {\mathsf {v}}_k : {\mathsf {v}}_k \in A_k \} \) | \(A_{k + 1}\subseteq A_k \cup \{{\mathsf {s}}_{k}\}\) |
EFW | \( {\mathsf {d}}_k = {\mathsf {y}}_k - {\mathsf {x}}_k, \quad {\mathsf {y}}_k \in {{\,\mathrm{arg\,min}\,}}\{f({\mathsf {y}}): {\mathsf {y}}\in {{\,\mathrm{conv}\,}}(A_k) \} \) | \(A_{k + 1}\subseteq A_k \cup \{{\mathsf {s}}_{k}\}\) |
FDFW | \( {\mathsf {d}}_k \in {{\,\mathrm{arg\,max}\,}}\{-\nabla f({\mathsf {x}}_k)^\intercal {\mathsf {d}} : {\mathsf {d}}\in \{{\mathsf {x}}_k - {\mathsf {v}}_k, {\mathsf {d}}_k^{FW}\}, \ {\mathsf {v}}_k \in A_k \} \) | \(A_k = {\mathcal {F}}({\mathsf {x}}_k)\) |
5.4 Sparse approximation properties
5.5 Affine invariance
5.6 Support identification for the AFW
5.7 Inexact linear oracle
6 Improved rates for strongly convex objectives
Method | Objective | Domain | Assumptions | Rate |
---|---|---|---|---|
FW | NC | Generic | – |
\({{\mathcal {O}}}(1/\sqrt{k})\)
|
FW | C | Generic | – |
\({{\mathcal {O}}}(1/k)\)
|
FW | SC | Generic |
\({\mathsf {x}}^* \in {{\,\mathrm{ri}\,}}(C)\)
| Linear |
Variants | SC | Polytope | – | Linear |
FW | SC | Strongly convex | – |
\({{\mathcal {O}}}(1/k^2)\)
|
FW | SC | Strongly convex |
\(\min \Vert \nabla f({\mathsf {x}}) \Vert > 0 \)
| Linear |