Skip to main content
Log in

Further properties of the forward–backward envelope with applications to difference-of-convex programming

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

In this paper, we further study the forward–backward envelope first introduced in Patrinos and Bemporad (Proceedings of the IEEE Conference on Decision and Control, pp 2358–2363, 2013) and Stella et al. (Comput Optim Appl, doi:10.1007/s10589-017-9912-y, 2017) for problems whose objective is the sum of a proper closed convex function and a twice continuously differentiable possibly nonconvex function with Lipschitz continuous gradient. We derive sufficient conditions on the original problem for the corresponding forward–backward envelope to be a level-bounded and Kurdyka–Łojasiewicz function with an exponent of \(\frac{1}{2}\); these results are important for the efficient minimization of the forward–backward envelope by classical optimization algorithms. In addition, we demonstrate how to minimize some difference-of-convex regularized least squares problems by minimizing a suitably constructed forward–backward envelope. Our preliminary numerical results on randomly generated instances of large-scale \(\ell _{1-2}\) regularized least squares problems (Yin et al. in SIAM J Sci Comput 37:A536–A563, 2015) illustrate that an implementation of this approach with a limited-memory BFGS scheme usually outperforms standard first-order methods such as the nonmonotone proximal gradient method in Wright et al. (IEEE Trans Signal Process 57:2479–2493, 2009).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. This is equivalent to \(\liminf _{\Vert x\Vert \rightarrow \infty }g(x) = \infty \); see [4, Page 83].

  2. \(\lambda _{\max }(A^TA)\) is computed via the MATLAB code opts.issym = 1; lambda= eigs(A*A’,1,’LM’,opts); when \(m>2000\), and by lambda = norm(A*A’) otherwise.

  3. We note on passing that the computation of \(\nabla F_\gamma \) is simple: it involves the proximal mapping of P in (36), which boils down to evaluating the \(\ell _1\) shrinkage operator (proximal mapping of \(\ell _1\) norm) and the projection onto B(0, 1). In addition, in our numerical experiments below, the steepest descent direction was never invoked.

  4. Thus, with high probability, A does not have zero columns.

  5. The dimension parameters are similar to those used in [21, Section 3], except that our s is twice as large. This indicates that the test instances we consider here are harder in terms of sparse recovery.

  6. The CPU times under the FBE column do not include the times for computing \(\lambda _{\max }(A^TA)\). The latter are reported separately in the fourth column of each table.

  7. For all three algorithms, we output the \(z^k\) at termination and compute the function value \(h(z^k)\).

References

  1. Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116, 5–16 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  2. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35, 438–457 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  3. Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137, 91–129 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  4. Auslender, A., Teboulle, M.: Asymptotic Cones and Functions in Optimization and Variational Inequalities. Springer, Berlin (2003)

    MATH  Google Scholar 

  5. Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17, 1205–1223 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  6. Bauschke, H.H., Borwein, J.M., Combettes, P.L.: Bregman monotone optimization algorithms. SIAM J. Control Optim. 42, 596–636 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  7. Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, Berlin (2011)

    Book  MATH  Google Scholar 

  8. Byrd, R.H., Chin, G.M., Nocedal, J., Oztoprak, F.: A family of second-order methods for convex \(\ell _1\)-regularized optimization. Math. Program. 159, 435–467 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  9. Candès, E.J., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory 51, 4203–4215 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  10. Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging Vis. 20, 89–97 (2004)

    Article  MathSciNet  Google Scholar 

  11. Chen, X., Lu, Z., Pong, T.K.: Penalty methods for a class of non-Lipschitz optimization problems. SIAM J. Optim. 26, 1465–1492 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  12. Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52, 1289–1306 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  13. Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems Volume I/II. Springer, Berlin (2003)

    MATH  Google Scholar 

  14. Fan, J., Li, R.: Variable selection via nonconvex penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2011)

    Article  MATH  Google Scholar 

  15. Friedlander, M., Goh, G.: Efficient evaluation of scaled proximal operators. Preprint arXiv:1603.05719 (2016)

  16. Gong, P., Zhang, C., Lu, Z., Huang, J.Z., Ye, J.: A general iterative Shrinkage and thresholding algorithm for non-convex regularized optimization problems. Proc. Int. Conf. Mach. Learn. 28, 37–45 (2013)

  17. Griesse, R., Lorenz, D.A.: A semismooth Newton method for Tikhonov functionals with sparsity constraints. Inverse Probl. 24, 035007 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  18. Kan, C., Song, W.: The Moreau envelope function and proximal mapping in the sense of the Bregman distance. Nonlinear Anal. Theory Methods Appl. 75, 1385–1399 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  19. Lee, J.D., Sun, Y., Saunders, M.A.: Proximal Newton-type methods for minimizing composite functions. SIAM J. Optim. 24, 1420–1443 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  20. Li, G., Pong, T.K.: Calculus of the exponent of Kurdyka–Łojasiewicz inequality and its applications to linear convergence of first-order methods. Preprint arXiv:1602.02915 (2016)

  21. Lu, Z., Pong, T.K., Zhang, Y.: An alternating direction method for finding Dantzig selectors. Comput. Stat. Data Anal. 56, 4037–4946 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  22. Luo, Z.Q., Tseng, P.: On the linear convergence of descent methods for convex essentially smooth minimization. SIAM J. Control Optim. 30, 408–425 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  23. Luo, Z.Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46, 157–178 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  24. Luo, Z.Q., Tseng, P.: On the convergence rate of dual ascent methods for linearly constrained convex minimization. Math. Oper. Res. 18, 846–867 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  25. Milzarek, A., Ulbrich, M.: A semismooth Newton method with multidimensional filter globalization for \(\ell _1\)-optimization. SIAM J. Optim. 24, 298–333 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  26. Nocedal, J., Wright, S.J.: Numerical Optimization, 1st edn. Springer, Berlin (1999)

    Book  MATH  Google Scholar 

  27. Noll, D., Rondepierre, A.: Convergence of linesearch and trust-region methods using the Kurdyka–Łojasiewicz inequality. In: Bailey, D.H., Bauschke, H.H., Borwein, P., Garvan, F., Théra, M., Vanderwerff, J.D., Wolkowicz, H. (eds.) Computational and Analytical Mathematics. Springer, Berlin (2013)

    Google Scholar 

  28. Patrinos, P., Bemporad, A.: Proximal Newton methods for convex composite optimization. In: Proceedings of the IEEE Conference on Decision and Control, pp. 2358–2363. (2013)

  29. Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis. Springer, Berlin (1998)

    Book  MATH  Google Scholar 

  30. Stella, L., Themelis, A., Patrinos, P.: Forward–backward quasi-Newton methods for nonsmooth optimization problems. Comput. Optim. Appl. (2017). doi:10.1007/s10589-017-9912-y

  31. Tseng, P., Yun, S.: A coordinate gradient descent method for linearly constrained smooth optimization and support vector machines training. Comput. Optim. Appl. 47, 179–206 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  32. Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. Ser. B 117, 387–423 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  33. Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. Ser. B 125, 263–295 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  34. Wang, Y., Luo, Z., Zhang, X.: New improved penalty methods for sparse reconstruction based on difference of two norms. Preprint. doi:10.13140/RG.2.1.3256.3369 (2015)

  35. Wright, S.J., Nowak, R., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE T. Signal Process. 57, 2479–2493 (2009)

    Article  MathSciNet  Google Scholar 

  36. Xiao, X., Li, Y., Wen, Z., Zhang, L.: Semi-smooth second-order type methods for composite convex programs. Preprint arXiv:1603.07870 (2016)

  37. Yin, P., Lou, Y., He, Q., Xin, J.: Minimization of \(\ell _{1-2}\) for compressed sensing. SIAM J. Sci. Comput. 37, A536–A563 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  38. Zhang, C.-H.: Nearby unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)

    Article  MATH  Google Scholar 

  39. Zhou, Z., So, A.M.-C.: A unified approach to error bounds for structured convex optimization problems. Preprint arXiv:1512.03518 (2015)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ting Kei Pong.

Additional information

Tianxiang Liu was supported partly by the AMSS-PolyU Joint Research Institute Postdoctoral Scheme.

Ting Kei Pong was supported partly by Hong Kong Research Grants Council PolyU253008/15p.

Appendix: closed form formula for NPG subproblems

Appendix: closed form formula for NPG subproblems

In this appendix, we derive a closed form formula for the following problem,

$$\begin{aligned} \min _{x} \ \frac{1}{2}\Vert x - y\Vert ^2 + \mu _1 \Vert x\Vert _1 - \mu _2 \Vert x\Vert , \end{aligned}$$
(40)

where \(\mu _1 \ge \mu _2 > 0\) and \(y\in \mathbb {R}^n\) is given. It is easy to see that (40) covers (37) as a special case and hence we will obtain a closed form formula for these NPG subproblems. To proceed with our derivation, we first establish the following lemma.

Lemma 7.1

Let \(v\in \mathbb {R}^n\) and define \({{\mathcal {I}}}:= \{i:\; v_i < 0\}\). Consider the optimization problem

$$\begin{aligned} \min _{\Vert x\Vert = 1, x\ge 0} v^Tx \end{aligned}$$
(41)
  1. (i)

    Suppose that \({{\mathcal {I}}}\ne \emptyset \). Then an optimal solution \(x^*\) of (41) is given by

    $$\begin{aligned} x^*_i = {\left\{ \begin{array}{ll} -\frac{v_i}{\Vert v_{{{\mathcal {I}}}}\Vert } &{} \mathrm{if}\ i \in {{\mathcal {I}}},\\ 0 &{} \mathrm{otherwise}, \end{array}\right. } \end{aligned}$$
    (42)

    where \(v_{{\mathcal {I}}}\) is the subvector of v indexed by \({{\mathcal {I}}}\).

  2. (ii)

    Suppose that \({{\mathcal {I}}} = \emptyset \) and take an \(i_*\in \{i:\; v_i = \min _k v_k\}\). Then an optimal solution \(x^*\) of (41) is given by

    $$\begin{aligned} x^*_i = {\left\{ \begin{array}{ll} 1 &{} \mathrm{if}\ i = i_*,\\ 0 &{} \mathrm{otherwise}. \end{array}\right. } \end{aligned}$$
    (43)

Proof

It is clear that an optimal solution of (41) exists.

Suppose first that \({{\mathcal {I}}} \ne \emptyset \). In this case, we consider the following relaxation of (41):

$$\begin{aligned} \min _{\Vert x\Vert \le 1, x\ge 0} v^Tx \end{aligned}$$
(44)

Let \(x^*\) be an optimal solution of (44). Then it is easy to see that for any i that corresponds to \(v_i > 0\), we must have \(x^*_i = 0\); because otherwise, one can zero out these \(x_i^*\) to obtain a feasible solution with a strictly smaller objective value. Next, since \({{\mathcal {I}}}\) is nonempty, it is not hard to see that one must have \(x^*_i = 0\) for all i corresponding to \(v_i = 0\), because otherwise, one can further decrease the objective value by setting these entries to zero while increasing some \(x_i^*\) with \(i\in {{\mathcal {I}}}\). Thus, \(x^*_i = 0\) for all i that correspond to \(v_i \ge 0\). Finally, it must hold that \(\Vert x^*\Vert = 1\) because otherwise one can further increase \(x_i^*\) for \(i\in {{\mathcal {I}}}\) to decrease the objective value. Thus, we conclude that \(x^*\) must take the form of (42). Since this \(x^*\) is optimal for (44) and is also feasible for (41), it must also be optimal for (41).

Next, suppose that \({{\mathcal {I}}}\) is empty. This means that v is a nonnegative vector. Observe that for any x feasible for (41) so that \(\Vert x\Vert = 1\), we have

$$\begin{aligned} 1 = \Vert x\Vert ^2 \le (e^Tx)^2 \le n \Vert x\Vert ^2 = n, \end{aligned}$$

showing that the following optimization problem is a relaxation of (41):

$$\begin{aligned} \min _{1\le e^Tx \le \sqrt{n}, x\ge 0} v^Tx. \end{aligned}$$
(45)

Since v is nonnegative, a simple scaling argument indicates that any optimal solution \(x^*\) of (45) has to satisfy \(e^Tx^* = 1\). In particular, this shows that the optimal value of (45) is given by \(\min _i v_i\) and the \(x^*\) given by (43) is an optimal solution. Since this \(x^*\) is clearly feasible for (41), it is also optimal for (41). This completes the proof. \(\square \)

The next proposition gives an explicit formula for a minimizer of (40).

Proposition 7.1

Let \({{\mathcal {I}}} = \{i:\; \mu _1 < |y_i|\}\).

  1. (i)

    Suppose that \({{\mathcal {I}}}\) is nonempty. Then a solution \(x^*\) of (40) is given by

    $$\begin{aligned} x^*_i = {\left\{ \begin{array}{ll} \mathrm{sgn}(y_i)(\mu _2 + \Vert |y_{{\mathcal {I}}}|-\mu _1 e_{{\mathcal {I}}}\Vert )\frac{|y_i| - \mu _1}{\Vert |y_{{\mathcal {I}}}|-\mu _1 e_{{\mathcal {I}}}\Vert } &{} \mathrm{if}\ i \in {{\mathcal {I}}},\\ 0 &{} \mathrm{otherwise}, \end{array}\right. } \end{aligned}$$

    where \(y_{{\mathcal {I}}}\) is the subvector of y indexed by \({{\mathcal {I}}}\), the absolute value \(|y_{{\mathcal {I}}}|\) is taken componentwise, and \(e_{{\mathcal {I}}}\) is the vector of all ones of dimension \(|{{\mathcal {I}}}|\).

  2. (ii)

    Suppose that \({{\mathcal {I}}}\) is empty and take an \(i_*\in \{i:\; \mu _1 - |y_i| = \min _k\{\mu _1 - |y_k|\}\}\). Then a solution \(x^*\) of (40) is given by

    $$\begin{aligned} x^*_i = {\left\{ \begin{array}{ll} \mathrm{sgn}(y_{i_*})\max \{\mu _2 - (\mu _1-|y_{i_*}|),0\} &{} \mathrm{if}\ i = i_*,\\ 0 &{} \mathrm{otherwise}. \end{array}\right. } \end{aligned}$$

Proof

Using a substitution \(x = \alpha \circ w\), where \(\alpha \in \{-1,1\}^n\), \(w\ge 0\) and \(\circ \) denotes entrywise product, and expanding the quadratic term, we see that problem (40) can be equivalently written as

$$\begin{aligned} \min _{\alpha \in \{-1,1\}^n,w\ge 0} \frac{1}{2}\Vert w\Vert ^2 - (\alpha \circ y)^Tw + \mu _1 e^Tw - \mu _2 \Vert w\Vert . \end{aligned}$$

Applying a further substitution \(w = ru\) with a number \(r\ge 0\) and a nonnegative vector \(\Vert u\Vert = 1\), the above problem can be further reformulated as

It is easy to check that the inner minimization is attained at \(r = \max \{\mu _2 - (\mu _1e-\alpha \circ y)^Tu,0\}\). Plugging this back, the optimization problem now becomes

Since the function \(t\mapsto -\frac{1}{2}(\max \{\mu _2 - t,0\})^2\) is nondecreasing, to obtain an optimal solution of the above optimization problem, one only needs to consider the problem

For this problem, since \(u\ge 0\), we must have \(\alpha ^* = \mathrm{sgn}(y)\) at optimality. This further reduces the above problem to

$$\begin{aligned} \min _{u\ge 0,\Vert u\Vert =1}(\mu _1e-|y|)^Tu. \end{aligned}$$

The conclusion of this proposition now follows from this observation, Lemma 7.1, the facts that \(x = \alpha \circ (ru)\) with \(r = \max \{\mu _2 - (\mu _1e-\alpha \circ y)^Tu,0\}\) and \(\alpha ^* = \mathrm{sgn}(y)\). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, T., Pong, T.K. Further properties of the forward–backward envelope with applications to difference-of-convex programming. Comput Optim Appl 67, 489–520 (2017). https://doi.org/10.1007/s10589-017-9900-2

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-017-9900-2

Keywords

Navigation