Abstract
In this paper, we further study the forward–backward envelope first introduced in Patrinos and Bemporad (Proceedings of the IEEE Conference on Decision and Control, pp 2358–2363, 2013) and Stella et al. (Comput Optim Appl, doi:10.1007/s10589-017-9912-y, 2017) for problems whose objective is the sum of a proper closed convex function and a twice continuously differentiable possibly nonconvex function with Lipschitz continuous gradient. We derive sufficient conditions on the original problem for the corresponding forward–backward envelope to be a level-bounded and Kurdyka–Łojasiewicz function with an exponent of \(\frac{1}{2}\); these results are important for the efficient minimization of the forward–backward envelope by classical optimization algorithms. In addition, we demonstrate how to minimize some difference-of-convex regularized least squares problems by minimizing a suitably constructed forward–backward envelope. Our preliminary numerical results on randomly generated instances of large-scale \(\ell _{1-2}\) regularized least squares problems (Yin et al. in SIAM J Sci Comput 37:A536–A563, 2015) illustrate that an implementation of this approach with a limited-memory BFGS scheme usually outperforms standard first-order methods such as the nonmonotone proximal gradient method in Wright et al. (IEEE Trans Signal Process 57:2479–2493, 2009).
Similar content being viewed by others
Notes
This is equivalent to \(\liminf _{\Vert x\Vert \rightarrow \infty }g(x) = \infty \); see [4, Page 83].
\(\lambda _{\max }(A^TA)\) is computed via the MATLAB code opts.issym = 1; lambda= eigs(A*A’,1,’LM’,opts); when \(m>2000\), and by lambda = norm(A*A’) otherwise.
We note on passing that the computation of \(\nabla F_\gamma \) is simple: it involves the proximal mapping of P in (36), which boils down to evaluating the \(\ell _1\) shrinkage operator (proximal mapping of \(\ell _1\) norm) and the projection onto B(0, 1). In addition, in our numerical experiments below, the steepest descent direction was never invoked.
Thus, with high probability, A does not have zero columns.
The dimension parameters are similar to those used in [21, Section 3], except that our s is twice as large. This indicates that the test instances we consider here are harder in terms of sparse recovery.
The CPU times under the FBE column do not include the times for computing \(\lambda _{\max }(A^TA)\). The latter are reported separately in the fourth column of each table.
For all three algorithms, we output the \(z^k\) at termination and compute the function value \(h(z^k)\).
References
Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116, 5–16 (2009)
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35, 438–457 (2010)
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137, 91–129 (2013)
Auslender, A., Teboulle, M.: Asymptotic Cones and Functions in Optimization and Variational Inequalities. Springer, Berlin (2003)
Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17, 1205–1223 (2006)
Bauschke, H.H., Borwein, J.M., Combettes, P.L.: Bregman monotone optimization algorithms. SIAM J. Control Optim. 42, 596–636 (2003)
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, Berlin (2011)
Byrd, R.H., Chin, G.M., Nocedal, J., Oztoprak, F.: A family of second-order methods for convex \(\ell _1\)-regularized optimization. Math. Program. 159, 435–467 (2016)
Candès, E.J., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory 51, 4203–4215 (2005)
Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging Vis. 20, 89–97 (2004)
Chen, X., Lu, Z., Pong, T.K.: Penalty methods for a class of non-Lipschitz optimization problems. SIAM J. Optim. 26, 1465–1492 (2016)
Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52, 1289–1306 (2006)
Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems Volume I/II. Springer, Berlin (2003)
Fan, J., Li, R.: Variable selection via nonconvex penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2011)
Friedlander, M., Goh, G.: Efficient evaluation of scaled proximal operators. Preprint arXiv:1603.05719 (2016)
Gong, P., Zhang, C., Lu, Z., Huang, J.Z., Ye, J.: A general iterative Shrinkage and thresholding algorithm for non-convex regularized optimization problems. Proc. Int. Conf. Mach. Learn. 28, 37–45 (2013)
Griesse, R., Lorenz, D.A.: A semismooth Newton method for Tikhonov functionals with sparsity constraints. Inverse Probl. 24, 035007 (2008)
Kan, C., Song, W.: The Moreau envelope function and proximal mapping in the sense of the Bregman distance. Nonlinear Anal. Theory Methods Appl. 75, 1385–1399 (2012)
Lee, J.D., Sun, Y., Saunders, M.A.: Proximal Newton-type methods for minimizing composite functions. SIAM J. Optim. 24, 1420–1443 (2014)
Li, G., Pong, T.K.: Calculus of the exponent of Kurdyka–Łojasiewicz inequality and its applications to linear convergence of first-order methods. Preprint arXiv:1602.02915 (2016)
Lu, Z., Pong, T.K., Zhang, Y.: An alternating direction method for finding Dantzig selectors. Comput. Stat. Data Anal. 56, 4037–4946 (2012)
Luo, Z.Q., Tseng, P.: On the linear convergence of descent methods for convex essentially smooth minimization. SIAM J. Control Optim. 30, 408–425 (1992)
Luo, Z.Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46, 157–178 (1993)
Luo, Z.Q., Tseng, P.: On the convergence rate of dual ascent methods for linearly constrained convex minimization. Math. Oper. Res. 18, 846–867 (1993)
Milzarek, A., Ulbrich, M.: A semismooth Newton method with multidimensional filter globalization for \(\ell _1\)-optimization. SIAM J. Optim. 24, 298–333 (2014)
Nocedal, J., Wright, S.J.: Numerical Optimization, 1st edn. Springer, Berlin (1999)
Noll, D., Rondepierre, A.: Convergence of linesearch and trust-region methods using the Kurdyka–Łojasiewicz inequality. In: Bailey, D.H., Bauschke, H.H., Borwein, P., Garvan, F., Théra, M., Vanderwerff, J.D., Wolkowicz, H. (eds.) Computational and Analytical Mathematics. Springer, Berlin (2013)
Patrinos, P., Bemporad, A.: Proximal Newton methods for convex composite optimization. In: Proceedings of the IEEE Conference on Decision and Control, pp. 2358–2363. (2013)
Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis. Springer, Berlin (1998)
Stella, L., Themelis, A., Patrinos, P.: Forward–backward quasi-Newton methods for nonsmooth optimization problems. Comput. Optim. Appl. (2017). doi:10.1007/s10589-017-9912-y
Tseng, P., Yun, S.: A coordinate gradient descent method for linearly constrained smooth optimization and support vector machines training. Comput. Optim. Appl. 47, 179–206 (2010)
Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. Ser. B 117, 387–423 (2009)
Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. Ser. B 125, 263–295 (2010)
Wang, Y., Luo, Z., Zhang, X.: New improved penalty methods for sparse reconstruction based on difference of two norms. Preprint. doi:10.13140/RG.2.1.3256.3369 (2015)
Wright, S.J., Nowak, R., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE T. Signal Process. 57, 2479–2493 (2009)
Xiao, X., Li, Y., Wen, Z., Zhang, L.: Semi-smooth second-order type methods for composite convex programs. Preprint arXiv:1603.07870 (2016)
Yin, P., Lou, Y., He, Q., Xin, J.: Minimization of \(\ell _{1-2}\) for compressed sensing. SIAM J. Sci. Comput. 37, A536–A563 (2015)
Zhang, C.-H.: Nearby unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)
Zhou, Z., So, A.M.-C.: A unified approach to error bounds for structured convex optimization problems. Preprint arXiv:1512.03518 (2015)
Author information
Authors and Affiliations
Corresponding author
Additional information
Tianxiang Liu was supported partly by the AMSS-PolyU Joint Research Institute Postdoctoral Scheme.
Ting Kei Pong was supported partly by Hong Kong Research Grants Council PolyU253008/15p.
Appendix: closed form formula for NPG subproblems
Appendix: closed form formula for NPG subproblems
In this appendix, we derive a closed form formula for the following problem,
where \(\mu _1 \ge \mu _2 > 0\) and \(y\in \mathbb {R}^n\) is given. It is easy to see that (40) covers (37) as a special case and hence we will obtain a closed form formula for these NPG subproblems. To proceed with our derivation, we first establish the following lemma.
Lemma 7.1
Let \(v\in \mathbb {R}^n\) and define \({{\mathcal {I}}}:= \{i:\; v_i < 0\}\). Consider the optimization problem
-
(i)
Suppose that \({{\mathcal {I}}}\ne \emptyset \). Then an optimal solution \(x^*\) of (41) is given by
$$\begin{aligned} x^*_i = {\left\{ \begin{array}{ll} -\frac{v_i}{\Vert v_{{{\mathcal {I}}}}\Vert } &{} \mathrm{if}\ i \in {{\mathcal {I}}},\\ 0 &{} \mathrm{otherwise}, \end{array}\right. } \end{aligned}$$(42)where \(v_{{\mathcal {I}}}\) is the subvector of v indexed by \({{\mathcal {I}}}\).
-
(ii)
Suppose that \({{\mathcal {I}}} = \emptyset \) and take an \(i_*\in \{i:\; v_i = \min _k v_k\}\). Then an optimal solution \(x^*\) of (41) is given by
$$\begin{aligned} x^*_i = {\left\{ \begin{array}{ll} 1 &{} \mathrm{if}\ i = i_*,\\ 0 &{} \mathrm{otherwise}. \end{array}\right. } \end{aligned}$$(43)
Proof
It is clear that an optimal solution of (41) exists.
Suppose first that \({{\mathcal {I}}} \ne \emptyset \). In this case, we consider the following relaxation of (41):
Let \(x^*\) be an optimal solution of (44). Then it is easy to see that for any i that corresponds to \(v_i > 0\), we must have \(x^*_i = 0\); because otherwise, one can zero out these \(x_i^*\) to obtain a feasible solution with a strictly smaller objective value. Next, since \({{\mathcal {I}}}\) is nonempty, it is not hard to see that one must have \(x^*_i = 0\) for all i corresponding to \(v_i = 0\), because otherwise, one can further decrease the objective value by setting these entries to zero while increasing some \(x_i^*\) with \(i\in {{\mathcal {I}}}\). Thus, \(x^*_i = 0\) for all i that correspond to \(v_i \ge 0\). Finally, it must hold that \(\Vert x^*\Vert = 1\) because otherwise one can further increase \(x_i^*\) for \(i\in {{\mathcal {I}}}\) to decrease the objective value. Thus, we conclude that \(x^*\) must take the form of (42). Since this \(x^*\) is optimal for (44) and is also feasible for (41), it must also be optimal for (41).
Next, suppose that \({{\mathcal {I}}}\) is empty. This means that v is a nonnegative vector. Observe that for any x feasible for (41) so that \(\Vert x\Vert = 1\), we have
showing that the following optimization problem is a relaxation of (41):
Since v is nonnegative, a simple scaling argument indicates that any optimal solution \(x^*\) of (45) has to satisfy \(e^Tx^* = 1\). In particular, this shows that the optimal value of (45) is given by \(\min _i v_i\) and the \(x^*\) given by (43) is an optimal solution. Since this \(x^*\) is clearly feasible for (41), it is also optimal for (41). This completes the proof. \(\square \)
The next proposition gives an explicit formula for a minimizer of (40).
Proposition 7.1
Let \({{\mathcal {I}}} = \{i:\; \mu _1 < |y_i|\}\).
-
(i)
Suppose that \({{\mathcal {I}}}\) is nonempty. Then a solution \(x^*\) of (40) is given by
$$\begin{aligned} x^*_i = {\left\{ \begin{array}{ll} \mathrm{sgn}(y_i)(\mu _2 + \Vert |y_{{\mathcal {I}}}|-\mu _1 e_{{\mathcal {I}}}\Vert )\frac{|y_i| - \mu _1}{\Vert |y_{{\mathcal {I}}}|-\mu _1 e_{{\mathcal {I}}}\Vert } &{} \mathrm{if}\ i \in {{\mathcal {I}}},\\ 0 &{} \mathrm{otherwise}, \end{array}\right. } \end{aligned}$$where \(y_{{\mathcal {I}}}\) is the subvector of y indexed by \({{\mathcal {I}}}\), the absolute value \(|y_{{\mathcal {I}}}|\) is taken componentwise, and \(e_{{\mathcal {I}}}\) is the vector of all ones of dimension \(|{{\mathcal {I}}}|\).
-
(ii)
Suppose that \({{\mathcal {I}}}\) is empty and take an \(i_*\in \{i:\; \mu _1 - |y_i| = \min _k\{\mu _1 - |y_k|\}\}\). Then a solution \(x^*\) of (40) is given by
$$\begin{aligned} x^*_i = {\left\{ \begin{array}{ll} \mathrm{sgn}(y_{i_*})\max \{\mu _2 - (\mu _1-|y_{i_*}|),0\} &{} \mathrm{if}\ i = i_*,\\ 0 &{} \mathrm{otherwise}. \end{array}\right. } \end{aligned}$$
Proof
Using a substitution \(x = \alpha \circ w\), where \(\alpha \in \{-1,1\}^n\), \(w\ge 0\) and \(\circ \) denotes entrywise product, and expanding the quadratic term, we see that problem (40) can be equivalently written as
Applying a further substitution \(w = ru\) with a number \(r\ge 0\) and a nonnegative vector \(\Vert u\Vert = 1\), the above problem can be further reformulated as
It is easy to check that the inner minimization is attained at \(r = \max \{\mu _2 - (\mu _1e-\alpha \circ y)^Tu,0\}\). Plugging this back, the optimization problem now becomes
Since the function \(t\mapsto -\frac{1}{2}(\max \{\mu _2 - t,0\})^2\) is nondecreasing, to obtain an optimal solution of the above optimization problem, one only needs to consider the problem
For this problem, since \(u\ge 0\), we must have \(\alpha ^* = \mathrm{sgn}(y)\) at optimality. This further reduces the above problem to
The conclusion of this proposition now follows from this observation, Lemma 7.1, the facts that \(x = \alpha \circ (ru)\) with \(r = \max \{\mu _2 - (\mu _1e-\alpha \circ y)^Tu,0\}\) and \(\alpha ^* = \mathrm{sgn}(y)\). \(\square \)
Rights and permissions
About this article
Cite this article
Liu, T., Pong, T.K. Further properties of the forward–backward envelope with applications to difference-of-convex programming. Comput Optim Appl 67, 489–520 (2017). https://doi.org/10.1007/s10589-017-9900-2
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-017-9900-2