Abstract
In this paper we consider \(l_0\) regularized convex cone programming problems. In particular, we first propose an iterative hard thresholding (IHT) method and its variant for solving \(l_0\) regularized box constrained convex programming. We show that the sequence generated by these methods converges to a local minimizer. Also, we establish the iteration complexity of the IHT method for finding an \({{\epsilon }}\)-local-optimal solution. We then propose a method for solving \(l_0\) regularized convex cone programming by applying the IHT method to its quadratic penalty relaxation and establish its iteration complexity for finding an \({{\epsilon }}\)-approximate local minimizer. Finally, we propose a variant of this method in which the associated penalty parameter is dynamically updated, and show that every accumulation point is a local izer of the problem.
Similar content being viewed by others
Notes
Such a convergence result of the IHT method when applied to \(l_0\)-regularized and -constrained least squares problems was already established by Blumensath and Davies [5].
\(\phi \) is strongly convex with modulus \(\sigma >0\) if \(\phi (\cdot ) - \frac{\sigma }{2}\Vert \cdot \Vert ^2\) is convex.
The existence of such \(L\) is proven in the “Appendix”.
References
Bahmani, S., Raj, B., Boufounos, P.: Greedy sparsity-constrained optimization. J. Mach. Learn Res. 14, 807–841 (2013)
Barzilai, J., Borwein, J.M.: Two point step size gradient methods. IMA J. Numer. Anal. 8, 141–148 (1988)
Blumensath, T.: Accelerated iterative hard thresholding. Signal Process. 92(3), 752–756 (2012)
Blumensath, T.: Compressed sensing with nonlinear observations and related nonlinear optimization problems. IEEE Trans. Inf. Theory 59(6), 3466–3474 (2013)
Blumensath, T., Davies, M.E.: Iterative thresholding for sparse approximations. J. Fourier Anal. Appl. 14, 629–654 (2008)
Blumensath, T., Davies, M.E.: Iterative hard thresholding for compressed sensing. Appl. Comput. Harmon. Anal. 27(3), 265–274 (2009)
Blumensath, T., Davies, M.E.: Normalized iterative hard thresholding: guaranteed stability and performance. IEEE J. Sel. Top. Signal Process. 4(2), 298–309 (2010)
Candès, E., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)
Cevher, V.: On accelerated hard thresholding methods for sparse approximation. In: Proceedings of SPIE, Conference on Wavelets and Sparsity XIV. San Diego, CA (2011)
Dempster, A.: Covariance selection. Biometrics 28, 157–175 (1978)
Donoho, D.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)
Foucart, S.: Hard thresholding pursuit: an algorithm for compressive sensing. SIAM J. Numer. Anal. 49(6), 2543–2563 (2011)
Herrity, K.K., Gilbert, A.C., Tropp, J.A.: Sparse approximation via iterative thresholding. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2006)
Jain, P., Meka, R., Dhillon, I.: Guaranteed rank minimization via singular value projection. Neural Inf. Process. Syst., 937–945 (2010)
Lan, G., Monteiro, R.D.C.: Iteration-complexity of first-order penalty methods for convex programming. Math. Progam. 138, 115–139 (2013)
Lu, Z., Zhang, Y.: Sparse approximation via penalty decomposition methods. To appear in SIAM J. Optim. (2013)
Maleki, A.: Coherence analysis of iterative thresholding algorithms. In: Proceedings of 47th Annual Allerton Conference on Communication, Control, and Computing, pp. 236–243 (2009)
Mallat, S., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Image Process. 41(12), 3397–3415 (1993)
Nesterov, Y.: Introductory Lectures on Convex Programming: A Basic Course. Kluwer, Boston, MA (2004)
Ng, A.Y.: Feature selection, \(l_1\) vs. \(l_2\) regularization, and rotational invariance. In: Proceedings of the Twenty-First International Conference on Machine learning (ICML), pp. 72–85 (2004)
Nikolova, M.: Description of the minimizers of least squares regularized with \(l_0\) norm. Report HAL-00723812, CMLA-CNRS ENS Cachan, France (2011)
Tanner, J., Wei, K.: Normalized iterative hard thresholding for matrix completion. Accepted by SIAM J. Sci. Comput. (2012)
Tropp, J.A.: Greed is good: algorithmic results for sparse approximation. IEEE Trans. Inf. Theory 50(10), 2231–2242 (2004)
Acknowledgments
The author is grateful to Ting Kei Pong for proofreading and suggestions. He would also like to thank the anonymous referees for pointing out many relevant references and for suggestions to improve the presentation.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported in part by NSERC Discovery Grant. Part of this work was conducted during the author’s sabbatical leave in Department of Industrial and Systems Engineering at Texas A&M University. The author would like to thank them for hosting his visit.
Appendix
Appendix
In this appendix we show that if \(f\) is smooth strongly convex, there exists \(L>L_f\) such that \(\alpha >0\), which is required in Theorem 3.5. In particular, we will show that there exist at most \(3n2^{n-1}\) number of \(L\) such that \(\alpha =0\); in other words, \(\alpha >0\) for almost all \(L>0\) except at most \(3n2^{n-1}\) number of points. Therefore, \(\alpha >0\) is almost sure for a randomly chosen \(L\) and an arbitrary \(L>L_f\) may be sufficient for practical implementation of the IHT method for solving problem (12).
To this aim, assume that \(f\) is smooth strongly convex, which is imposed in Theorem 3.5, and let \(\alpha \) be defined in (29). We now show that \(\alpha >0\) for almost all \(L>0\) except at most \(3n2^{n-1}\) values. For simplicity, we only consider the case where \(-\infty <l_i<0\) and \(0<u_i<\infty \) for all \(i\). The proof is similar for the other cases. Let \(\mathcal{B }_I,\,\Pi _\mathcal{B }(\cdot )\) and \(s_L(\cdot )\) be defined in (15)–(17), respectively. Given any \(I \subseteq \{1,\ldots ,n\}\) and \(1 \le i \le n\), let
where \(x^* \in {\mathrm{Arg}}\min \{f(x): x\in \mathcal{B }_I\}\). Clearly, \(h_i(L;I)\) is well-defined since \(x^*\) is unique for each \(I\) due to the strongly convexity of \(f\). Notice that \(\alpha \) as defined in (29) can be rewritten as
As we will later show, for each \(I \subseteq \{1,\ldots ,n\},\,h_i(L;I)=0\) has at most two positive roots \(L\) for every \(i\in I\) and at most one positive root \(L\) for any \(i \notin I\). This together with (66) implies that for each \(I \subseteq \{1,\ldots ,n\}\) there exist at most
number of \(L\) such that \(\alpha =0\), where \(|I|\) denotes the size of \(I\). Therefore, the total number of \(L\) such that \(\alpha =0\) is at most
It remains to show that for each \(I \subseteq \{1,\ldots ,n\},\,h_i(L;I)=0\) has at most two positive roots \(L\) for every \(i\in I\) and at most one positive root \(L\) for any \(i \notin I\). We prove this result by considering all possible cases below. Let \(i\in \{1,\ldots ,n\}\) and \(x^* =\arg \min \{f(x): x\in \mathcal{B }_I\}\).
-
(1)
\({[\nabla f(x^*)]}_i=0\): one can see that \({[s_L(x^*)]}_i=x^*_i\). Hence, \({[\Pi _{\mathcal{B }}(s_L(x^*))]}_i= x^*_i\) and \(h_i(L;I)=(x^*_i)^2-\frac{2\lambda }{L}\). It follows that \(h_i(L;I)=0\) has at most one positive root \(L\).
-
(2)
\({[\nabla f(x^*)]}_i \ne 0\): by the first-order optimality condition for \(x^* = \arg \min \{f(x): \ x\in \mathcal{B }_I\}\), one can observe that if \(i \notin I\) and \({[\nabla f(x^*)]}_i < 0\), then \(x^*_i=u_i\); if \(i \notin I\) and \({[\nabla f(x^*)]}_i > 0\), then \(x^*_i=l_i\). Hence, this case can be divided into three subcases:
-
(2a)
\(i\in I\): it follows that \(x^*_i=0\) and \({[s_L(x^*)]}_i = -\frac{1}{L} \nabla _i f(x^*)\). Let
$$\begin{aligned} \theta = \max \left( -\frac{\nabla _i f(x^*)}{l_i},-\frac{\nabla _i f(x^*)}{u_i}\right) \!. \end{aligned}$$One can observe that
$$\begin{aligned} {[s_L(x^*)]}_i \in \left\{ \begin{array}{l@{\quad }l} {[l_i,u_i]} &{} \ \text{ if } \ L > \theta , \\ (-\infty ,l_i] &{} \ \text{ if } \ 0 < L \le \theta \ \text{ and } \ \nabla _i f(x^*)>0, \\ {[u_i, \infty )} &{} \ \text{ if } \ 0 < L \le \theta \ \text{ and } \ \nabla _i f(x^*)<0. \end{array}\right. \end{aligned}$$It follows that
$$\begin{aligned} {[\Pi _\mathcal{B }(s_L(x^*))]}_i =\left\{ \begin{array}{ll} {[s_L(x^*)]}_i &{} \ \text{ if } \ L > \theta , \\ l_i &{} \ \text{ if } \ 0< L \le \theta \ \text{ and } \ \nabla _i f(x^*)>0, \\ u_i &{} \ \text{ if } \ 0< L \le \theta \ \text{ and } \ \nabla _i f(x^*)<0. \end{array}\right. \end{aligned}$$Using this relation and \({[s_L(x^*)]}_i = -\frac{1}{L} \nabla _i f(x^*)\), we have
$$\begin{aligned} h_i(L;I) = \left\{ \begin{array}{ll} \frac{1}{L^2} {[\nabla _i f(x^*)]}^2 - \frac{\lambda }{2L}, &{} \text{ if } \ L > \theta , \\ -l^2_i - \frac{2l_i\nabla _i f(x^*)}{L}- \frac{\lambda }{2L}, &{}\ \text{ if } \ 0< L \le \theta \ \text{ and } \ \nabla _i f(x^*)>0, \\ -u^2_i - \frac{2u_i\nabla _i f(x^*)}{L}- \frac{\lambda }{2L}, &{}\ \text{ if } \ 0<L \le \theta \ \text{ and } \ \nabla _i f(x^*)<0. \end{array}\right. \end{aligned}$$Therefore, one can see that \(h_i(L;I)=0\) has at most two positive roots \(L\).
-
(2b)
\(i\notin I\) and \({[\nabla f(x^*)]}_i < 0\): as mentioned above, \(x^*_i=u_i\) holds for this subcase. One can observe that for all \(L>0\),
$$\begin{aligned} {[s_L(x^*)]}_i = x^*_i - \frac{1}{L} \nabla _i f(x^*) > u_i, \end{aligned}$$and hence \([\Pi _\mathcal{B }(s_L(x^*))]_i=u_i\). Using this relation and \(x^*_i=u_i\), we have
$$\begin{aligned} h_i(L;I) = u^2_i - \frac{2u_i\nabla _if(x^*)}{L} - \frac{\lambda }{2L}, \ \ \ \forall L>0. \end{aligned}$$Thus, \(h_i(L;I)=0\) has at most one positive root \(L\).
-
(2c)
\(i\notin I\) and \([\nabla f(x^*)]_i > 0\): as mentioned above, \(x^*_i=l_i\) holds for this subcase. By a similar argument as subcase (2b), one can have
$$\begin{aligned} h_i(L;I) = l^2_i - \frac{2l_i\nabla _if(x^*)}{L} - \frac{\lambda }{2L}, \ \ \ \forall L>0. \end{aligned}$$Hence, \(h_i(L;I)=0\) has at most one positive root \(L\).
-
(2a)
Combining all cases above, we obtain the result as desired.
Rights and permissions
About this article
Cite this article
Lu, Z. Iterative hard thresholding methods for \(l_0\) regularized convex cone programming. Math. Program. 147, 125–154 (2014). https://doi.org/10.1007/s10107-013-0714-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-013-0714-4
Keywords
- Sparse approximation
- Iterative hard thresholding method
- \(l_0\) Regularization
- Box constrained convex programming
- Convex cone programming