Further properties of the forward–backward envelope with applications to difference-of-convex programming

Liu, Tianxiang; Pong, Ting Kei

doi:10.1007/s10589-017-9900-2

Further properties of the forward–backward envelope with applications to difference-of-convex programming

Published: 14 February 2017

Volume 67, pages 489–520, (2017)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

Tianxiang Liu¹ &
Ting Kei Pong¹

2260 Accesses
35 Citations
Explore all metrics

Abstract

In this paper, we further study the forward–backward envelope first introduced in Patrinos and Bemporad (Proceedings of the IEEE Conference on Decision and Control, pp 2358–2363, 2013) and Stella et al. (Comput Optim Appl, doi:10.1007/s10589-017-9912-y, 2017) for problems whose objective is the sum of a proper closed convex function and a twice continuously differentiable possibly nonconvex function with Lipschitz continuous gradient. We derive sufficient conditions on the original problem for the corresponding forward–backward envelope to be a level-bounded and Kurdyka–Łojasiewicz function with an exponent of $\frac{1}{2}$; these results are important for the efficient minimization of the forward–backward envelope by classical optimization algorithms. In addition, we demonstrate how to minimize some difference-of-convex regularized least squares problems by minimizing a suitably constructed forward–backward envelope. Our preliminary numerical results on randomly generated instances of large-scale $\ell _{1-2}$ regularized least squares problems (Yin et al. in SIAM J Sci Comput 37:A536–A563, 2015) illustrate that an implementation of this approach with a limited-memory BFGS scheme usually outperforms standard first-order methods such as the nonmonotone proximal gradient method in Wright et al. (IEEE Trans Signal Process 57:2479–2493, 2009).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Karush–Kuhn–Tucker conditions and duality for a class of convex adjustable robust optimization problem

Article 18 May 2024

IPRSOCP: A Primal-Dual Interior-Point Relaxation Algorithm for Second-Order Cone Programming

Article 17 May 2024

Optimal Control and Pontryagin’s Maximum Principle

Notes

This is equivalent to $\liminf _{\Vert x\Vert \rightarrow \infty }g(x) = \infty $; see [4, Page 83].
$\lambda _{\max }(A^TA)$ is computed via the MATLAB code opts.issym = 1; lambda= eigs(A*A’,1,’LM’,opts); when $m>2000$, and by lambda = norm(A*A’) otherwise.
We note on passing that the computation of $\nabla F_\gamma $ is simple: it involves the proximal mapping of P in (36), which boils down to evaluating the $\ell _1$ shrinkage operator (proximal mapping of $\ell _1$ norm) and the projection onto B(0, 1). In addition, in our numerical experiments below, the steepest descent direction was never invoked.
Thus, with high probability, A does not have zero columns.
The dimension parameters are similar to those used in [21, Section 3], except that our s is twice as large. This indicates that the test instances we consider here are harder in terms of sparse recovery.
The CPU times under the FBE column do not include the times for computing $\lambda _{\max }(A^TA)$. The latter are reported separately in the fourth column of each table.
For all three algorithms, we output the $z^k$ at termination and compute the function value $h(z^k)$.

References

Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116, 5–16 (2009)
Article MathSciNet MATH Google Scholar
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35, 438–457 (2010)
Article MathSciNet MATH Google Scholar
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137, 91–129 (2013)
Article MathSciNet MATH Google Scholar
Auslender, A., Teboulle, M.: Asymptotic Cones and Functions in Optimization and Variational Inequalities. Springer, Berlin (2003)
MATH Google Scholar
Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17, 1205–1223 (2006)
Article MathSciNet MATH Google Scholar
Bauschke, H.H., Borwein, J.M., Combettes, P.L.: Bregman monotone optimization algorithms. SIAM J. Control Optim. 42, 596–636 (2003)
Article MathSciNet MATH Google Scholar
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, Berlin (2011)
Book MATH Google Scholar
Byrd, R.H., Chin, G.M., Nocedal, J., Oztoprak, F.: A family of second-order methods for convex $\ell _1$-regularized optimization. Math. Program. 159, 435–467 (2016)
Article MathSciNet MATH Google Scholar
Candès, E.J., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory 51, 4203–4215 (2005)
Article MathSciNet MATH Google Scholar
Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging Vis. 20, 89–97 (2004)
Article MathSciNet Google Scholar
Chen, X., Lu, Z., Pong, T.K.: Penalty methods for a class of non-Lipschitz optimization problems. SIAM J. Optim. 26, 1465–1492 (2016)
Article MathSciNet MATH Google Scholar
Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52, 1289–1306 (2006)
Article MathSciNet MATH Google Scholar
Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems Volume I/II. Springer, Berlin (2003)
MATH Google Scholar
Fan, J., Li, R.: Variable selection via nonconvex penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2011)
Article MATH Google Scholar
Friedlander, M., Goh, G.: Efficient evaluation of scaled proximal operators. Preprint arXiv:1603.05719 (2016)
Gong, P., Zhang, C., Lu, Z., Huang, J.Z., Ye, J.: A general iterative Shrinkage and thresholding algorithm for non-convex regularized optimization problems. Proc. Int. Conf. Mach. Learn. 28, 37–45 (2013)
Griesse, R., Lorenz, D.A.: A semismooth Newton method for Tikhonov functionals with sparsity constraints. Inverse Probl. 24, 035007 (2008)
Article MathSciNet MATH Google Scholar
Kan, C., Song, W.: The Moreau envelope function and proximal mapping in the sense of the Bregman distance. Nonlinear Anal. Theory Methods Appl. 75, 1385–1399 (2012)
Article MathSciNet MATH Google Scholar
Lee, J.D., Sun, Y., Saunders, M.A.: Proximal Newton-type methods for minimizing composite functions. SIAM J. Optim. 24, 1420–1443 (2014)
Article MathSciNet MATH Google Scholar
Li, G., Pong, T.K.: Calculus of the exponent of Kurdyka–Łojasiewicz inequality and its applications to linear convergence of first-order methods. Preprint arXiv:1602.02915 (2016)
Lu, Z., Pong, T.K., Zhang, Y.: An alternating direction method for finding Dantzig selectors. Comput. Stat. Data Anal. 56, 4037–4946 (2012)
Article MathSciNet MATH Google Scholar
Luo, Z.Q., Tseng, P.: On the linear convergence of descent methods for convex essentially smooth minimization. SIAM J. Control Optim. 30, 408–425 (1992)
Article MathSciNet MATH Google Scholar
Luo, Z.Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46, 157–178 (1993)
Article MathSciNet MATH Google Scholar
Luo, Z.Q., Tseng, P.: On the convergence rate of dual ascent methods for linearly constrained convex minimization. Math. Oper. Res. 18, 846–867 (1993)
Article MathSciNet MATH Google Scholar
Milzarek, A., Ulbrich, M.: A semismooth Newton method with multidimensional filter globalization for $\ell _1$-optimization. SIAM J. Optim. 24, 298–333 (2014)
Article MathSciNet MATH Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization, 1st edn. Springer, Berlin (1999)
Book MATH Google Scholar
Noll, D., Rondepierre, A.: Convergence of linesearch and trust-region methods using the Kurdyka–Łojasiewicz inequality. In: Bailey, D.H., Bauschke, H.H., Borwein, P., Garvan, F., Théra, M., Vanderwerff, J.D., Wolkowicz, H. (eds.) Computational and Analytical Mathematics. Springer, Berlin (2013)
Google Scholar
Patrinos, P., Bemporad, A.: Proximal Newton methods for convex composite optimization. In: Proceedings of the IEEE Conference on Decision and Control, pp. 2358–2363. (2013)
Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis. Springer, Berlin (1998)
Book MATH Google Scholar
Stella, L., Themelis, A., Patrinos, P.: Forward–backward quasi-Newton methods for nonsmooth optimization problems. Comput. Optim. Appl. (2017). doi:10.1007/s10589-017-9912-y
Tseng, P., Yun, S.: A coordinate gradient descent method for linearly constrained smooth optimization and support vector machines training. Comput. Optim. Appl. 47, 179–206 (2010)
Article MathSciNet MATH Google Scholar
Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. Ser. B 117, 387–423 (2009)
Article MathSciNet MATH Google Scholar
Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. Ser. B 125, 263–295 (2010)
Article MathSciNet MATH Google Scholar
Wang, Y., Luo, Z., Zhang, X.: New improved penalty methods for sparse reconstruction based on difference of two norms. Preprint. doi:10.13140/RG.2.1.3256.3369 (2015)
Wright, S.J., Nowak, R., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE T. Signal Process. 57, 2479–2493 (2009)
Article MathSciNet Google Scholar
Xiao, X., Li, Y., Wen, Z., Zhang, L.: Semi-smooth second-order type methods for composite convex programs. Preprint arXiv:1603.07870 (2016)
Yin, P., Lou, Y., He, Q., Xin, J.: Minimization of $\ell _{1-2}$ for compressed sensing. SIAM J. Sci. Comput. 37, A536–A563 (2015)
Article MathSciNet MATH Google Scholar
Zhang, C.-H.: Nearby unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)
Article MATH Google Scholar
Zhou, Z., So, A.M.-C.: A unified approach to error bounds for structured convex optimization problems. Preprint arXiv:1512.03518 (2015)

Download references

Author information

Authors and Affiliations

Department of Applied Mathematics, The Hong Kong Polytechnic University, Kowloon, Hong Kong
Tianxiang Liu & Ting Kei Pong

Authors

Tianxiang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ting Kei Pong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ting Kei Pong.

Additional information

Tianxiang Liu was supported partly by the AMSS-PolyU Joint Research Institute Postdoctoral Scheme.

Ting Kei Pong was supported partly by Hong Kong Research Grants Council PolyU253008/15p.

Appendix: closed form formula for NPG subproblems

In this appendix, we derive a closed form formula for the following problem,

$$\begin{aligned} \min _{x} \ \frac{1}{2}\Vert x - y\Vert ^2 + \mu _1 \Vert x\Vert _1 - \mu _2 \Vert x\Vert , \end{aligned}$$

(40)

where $\mu _1 \ge \mu _2 > 0$ and $y\in \mathbb {R}^n$ is given. It is easy to see that (40) covers (37) as a special case and hence we will obtain a closed form formula for these NPG subproblems. To proceed with our derivation, we first establish the following lemma.

Lemma 7.1

Let $v\in \mathbb {R}^n$ and define ${{\mathcal {I}}}:= \{i:\; v_i < 0\}$. Consider the optimization problem

$$\begin{aligned} \min _{\Vert x\Vert = 1, x\ge 0} v^Tx \end{aligned}$$

(41)

(i)
Suppose that ${{\mathcal {I}}}\ne \emptyset $. Then an optimal solution $x^*$ of (41) is given by
$$\begin{aligned} x^*_i = {\left\{ \begin{array}{ll} -\frac{v_i}{\Vert v_{{{\mathcal {I}}}}\Vert } &{} \mathrm{if}\ i \in {{\mathcal {I}}},\\ 0 &{} \mathrm{otherwise}, \end{array}\right. } \end{aligned}$$
(42)
where $v_{{\mathcal {I}}}$ is the subvector of v indexed by ${{\mathcal {I}}}$.
(ii)
Suppose that ${{\mathcal {I}}} = \emptyset $ and take an $i_*\in \{i:\; v_i = \min _k v_k\}$. Then an optimal solution $x^*$ of (41) is given by
$$\begin{aligned} x^*_i = {\left\{ \begin{array}{ll} 1 &{} \mathrm{if}\ i = i_*,\\ 0 &{} \mathrm{otherwise}. \end{array}\right. } \end{aligned}$$
(43)

Proof

It is clear that an optimal solution of (41) exists.

Suppose first that ${{\mathcal {I}}} \ne \emptyset $. In this case, we consider the following relaxation of (41):

$$\begin{aligned} \min _{\Vert x\Vert \le 1, x\ge 0} v^Tx \end{aligned}$$

(44)

Let $x^*$ be an optimal solution of (44). Then it is easy to see that for any i that corresponds to $v_i > 0$, we must have $x^*_i = 0$; because otherwise, one can zero out these $x_i^*$ to obtain a feasible solution with a strictly smaller objective value. Next, since ${{\mathcal {I}}}$ is nonempty, it is not hard to see that one must have $x^*_i = 0$ for all i corresponding to $v_i = 0$, because otherwise, one can further decrease the objective value by setting these entries to zero while increasing some $x_i^*$ with $i\in {{\mathcal {I}}}$. Thus, $x^*_i = 0$ for all i that correspond to $v_i \ge 0$. Finally, it must hold that $\Vert x^*\Vert = 1$ because otherwise one can further increase $x_i^*$ for $i\in {{\mathcal {I}}}$ to decrease the objective value. Thus, we conclude that $x^*$ must take the form of (42). Since this $x^*$ is optimal for (44) and is also feasible for (41), it must also be optimal for (41).

Next, suppose that ${{\mathcal {I}}}$ is empty. This means that v is a nonnegative vector. Observe that for any x feasible for (41) so that $\Vert x\Vert = 1$, we have

$$\begin{aligned} 1 = \Vert x\Vert ^2 \le (e^Tx)^2 \le n \Vert x\Vert ^2 = n, \end{aligned}$$

showing that the following optimization problem is a relaxation of (41):

$$\begin{aligned} \min _{1\le e^Tx \le \sqrt{n}, x\ge 0} v^Tx. \end{aligned}$$

(45)

Since v is nonnegative, a simple scaling argument indicates that any optimal solution $x^*$ of (45) has to satisfy $e^Tx^* = 1$. In particular, this shows that the optimal value of (45) is given by $\min _i v_i$ and the $x^*$ given by (43) is an optimal solution. Since this $x^*$ is clearly feasible for (41), it is also optimal for (41). This completes the proof. $\square $

The next proposition gives an explicit formula for a minimizer of (40).

Proposition 7.1

Let ${{\mathcal {I}}} = \{i:\; \mu _1 < |y_i|\}$.

(i)
Suppose that ${{\mathcal {I}}}$ is nonempty. Then a solution $x^*$ of (40) is given by
$$\begin{aligned} x^*_i = {\left\{ \begin{array}{ll} \mathrm{sgn}(y_i)(\mu _2 + \Vert |y_{{\mathcal {I}}}|-\mu _1 e_{{\mathcal {I}}}\Vert )\frac{|y_i| - \mu _1}{\Vert |y_{{\mathcal {I}}}|-\mu _1 e_{{\mathcal {I}}}\Vert } &{} \mathrm{if}\ i \in {{\mathcal {I}}},\\ 0 &{} \mathrm{otherwise}, \end{array}\right. } \end{aligned}$$
where $y_{{\mathcal {I}}}$ is the subvector of y indexed by ${{\mathcal {I}}}$, the absolute value $|y_{{\mathcal {I}}}|$ is taken componentwise, and $e_{{\mathcal {I}}}$ is the vector of all ones of dimension $|{{\mathcal {I}}}|$.
(ii)
Suppose that ${{\mathcal {I}}}$ is empty and take an $i_*\in \{i:\; \mu _1 - |y_i| = \min _k\{\mu _1 - |y_k|\}\}$. Then a solution $x^*$ of (40) is given by
$$\begin{aligned} x^*_i = {\left\{ \begin{array}{ll} \mathrm{sgn}(y_{i_*})\max \{\mu _2 - (\mu _1-|y_{i_*}|),0\} &{} \mathrm{if}\ i = i_*,\\ 0 &{} \mathrm{otherwise}. \end{array}\right. } \end{aligned}$$

Proof

Using a substitution $x = \alpha \circ w$, where $\alpha \in \{-1,1\}^n$, $w\ge 0$ and $\circ $ denotes entrywise product, and expanding the quadratic term, we see that problem (40) can be equivalently written as

$$\begin{aligned} \min _{\alpha \in \{-1,1\}^n,w\ge 0} \frac{1}{2}\Vert w\Vert ^2 - (\alpha \circ y)^Tw + \mu _1 e^Tw - \mu _2 \Vert w\Vert . \end{aligned}$$

Applying a further substitution $w = ru$ with a number $r\ge 0$ and a nonnegative vector $\Vert u\Vert = 1$, the above problem can be further reformulated as

It is easy to check that the inner minimization is attained at $r = \max \{\mu _2 - (\mu _1e-\alpha \circ y)^Tu,0\}$. Plugging this back, the optimization problem now becomes

Since the function $t\mapsto -\frac{1}{2}(\max \{\mu _2 - t,0\})^2$ is nondecreasing, to obtain an optimal solution of the above optimization problem, one only needs to consider the problem

For this problem, since $u\ge 0$, we must have $\alpha ^* = \mathrm{sgn}(y)$ at optimality. This further reduces the above problem to

$$\begin{aligned} \min _{u\ge 0,\Vert u\Vert =1}(\mu _1e-|y|)^Tu. \end{aligned}$$

The conclusion of this proposition now follows from this observation, Lemma 7.1, the facts that $x = \alpha \circ (ru)$ with $r = \max \{\mu _2 - (\mu _1e-\alpha \circ y)^Tu,0\}$ and $\alpha ^* = \mathrm{sgn}(y)$. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, T., Pong, T.K. Further properties of the forward–backward envelope with applications to difference-of-convex programming. Comput Optim Appl 67, 489–520 (2017). https://doi.org/10.1007/s10589-017-9900-2

Download citation

Received: 30 April 2016
Published: 14 February 2017
Issue Date: July 2017
DOI: https://doi.org/10.1007/s10589-017-9900-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Further properties of the forward–backward envelope with applications to difference-of-convex programming

Abstract

Access this article

Similar content being viewed by others

Karush–Kuhn–Tucker conditions and duality for a class of convex adjustable robust optimization problem

IPRSOCP: A Primal-Dual Interior-Point Relaxation Algorithm for Second-Order Cone Programming

Optimal Control and Pontryagin’s Maximum Principle

Notes

References