Skip to main content
Log in

Iterative hard thresholding methods for \(l_0\) regularized convex cone programming

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

In this paper we consider \(l_0\) regularized convex cone programming problems. In particular, we first propose an iterative hard thresholding (IHT) method and its variant for solving \(l_0\) regularized box constrained convex programming. We show that the sequence generated by these methods converges to a local minimizer. Also, we establish the iteration complexity of the IHT method for finding an \({{\epsilon }}\)-local-optimal solution. We then propose a method for solving \(l_0\) regularized convex cone programming by applying the IHT method to its quadratic penalty relaxation and establish its iteration complexity for finding an \({{\epsilon }}\)-approximate local minimizer. Finally, we propose a variant of this method in which the associated penalty parameter is dynamically updated, and show that every accumulation point is a local izer of the problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Such a convergence result of the IHT method when applied to \(l_0\)-regularized and -constrained least squares problems was already established by Blumensath and Davies [5].

  2. \(\phi \) is strongly convex with modulus \(\sigma >0\) if \(\phi (\cdot ) - \frac{\sigma }{2}\Vert \cdot \Vert ^2\) is convex.

  3. The existence of such \(L\) is proven in the “Appendix”.

References

  1. Bahmani, S., Raj, B., Boufounos, P.: Greedy sparsity-constrained optimization. J. Mach. Learn Res. 14, 807–841 (2013)

    MathSciNet  MATH  Google Scholar 

  2. Barzilai, J., Borwein, J.M.: Two point step size gradient methods. IMA J. Numer. Anal. 8, 141–148 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  3. Blumensath, T.: Accelerated iterative hard thresholding. Signal Process. 92(3), 752–756 (2012)

    Article  Google Scholar 

  4. Blumensath, T.: Compressed sensing with nonlinear observations and related nonlinear optimization problems. IEEE Trans. Inf. Theory 59(6), 3466–3474 (2013)

    Article  MathSciNet  Google Scholar 

  5. Blumensath, T., Davies, M.E.: Iterative thresholding for sparse approximations. J. Fourier Anal. Appl. 14, 629–654 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  6. Blumensath, T., Davies, M.E.: Iterative hard thresholding for compressed sensing. Appl. Comput. Harmon. Anal. 27(3), 265–274 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  7. Blumensath, T., Davies, M.E.: Normalized iterative hard thresholding: guaranteed stability and performance. IEEE J. Sel. Top. Signal Process. 4(2), 298–309 (2010)

    Article  Google Scholar 

  8. Candès, E., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)

    Article  MATH  Google Scholar 

  9. Cevher, V.: On accelerated hard thresholding methods for sparse approximation. In: Proceedings of SPIE, Conference on Wavelets and Sparsity XIV. San Diego, CA (2011)

  10. Dempster, A.: Covariance selection. Biometrics 28, 157–175 (1978)

    Article  Google Scholar 

  11. Donoho, D.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  12. Foucart, S.: Hard thresholding pursuit: an algorithm for compressive sensing. SIAM J. Numer. Anal. 49(6), 2543–2563 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  13. Herrity, K.K., Gilbert, A.C., Tropp, J.A.: Sparse approximation via iterative thresholding. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2006)

  14. Jain, P., Meka, R., Dhillon, I.: Guaranteed rank minimization via singular value projection. Neural Inf. Process. Syst., 937–945 (2010)

  15. Lan, G., Monteiro, R.D.C.: Iteration-complexity of first-order penalty methods for convex programming. Math. Progam. 138, 115–139 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  16. Lu, Z., Zhang, Y.: Sparse approximation via penalty decomposition methods. To appear in SIAM J. Optim. (2013)

  17. Maleki, A.: Coherence analysis of iterative thresholding algorithms. In: Proceedings of 47th Annual Allerton Conference on Communication, Control, and Computing, pp. 236–243 (2009)

  18. Mallat, S., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Image Process. 41(12), 3397–3415 (1993)

    Article  MATH  Google Scholar 

  19. Nesterov, Y.: Introductory Lectures on Convex Programming: A Basic Course. Kluwer, Boston, MA (2004)

    Book  Google Scholar 

  20. Ng, A.Y.: Feature selection, \(l_1\) vs. \(l_2\) regularization, and rotational invariance. In: Proceedings of the Twenty-First International Conference on Machine learning (ICML), pp. 72–85 (2004)

  21. Nikolova, M.: Description of the minimizers of least squares regularized with \(l_0\) norm. Report HAL-00723812, CMLA-CNRS ENS Cachan, France (2011)

  22. Tanner, J., Wei, K.: Normalized iterative hard thresholding for matrix completion. Accepted by SIAM J. Sci. Comput. (2012)

  23. Tropp, J.A.: Greed is good: algorithmic results for sparse approximation. IEEE Trans. Inf. Theory 50(10), 2231–2242 (2004)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The author is grateful to Ting Kei Pong for proofreading and suggestions. He would also like to thank the anonymous referees for pointing out many relevant references and for suggestions to improve the presentation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhaosong Lu.

Additional information

This work was supported in part by NSERC Discovery Grant. Part of this work was conducted during the author’s sabbatical leave in Department of Industrial and Systems Engineering at Texas A&M University. The author would like to thank them for hosting his visit.

Appendix

Appendix

In this appendix we show that if \(f\) is smooth strongly convex, there exists \(L>L_f\) such that \(\alpha >0\), which is required in Theorem 3.5. In particular, we will show that there exist at most \(3n2^{n-1}\) number of \(L\) such that \(\alpha =0\); in other words, \(\alpha >0\) for almost all \(L>0\) except at most \(3n2^{n-1}\) number of points. Therefore, \(\alpha >0\) is almost sure for a randomly chosen \(L\) and an arbitrary \(L>L_f\) may be sufficient for practical implementation of the IHT method for solving problem (12).

To this aim, assume that \(f\) is smooth strongly convex, which is imposed in Theorem 3.5, and let \(\alpha \) be defined in (29). We now show that \(\alpha >0\) for almost all \(L>0\) except at most \(3n2^{n-1}\) values. For simplicity, we only consider the case where \(-\infty <l_i<0\) and \(0<u_i<\infty \) for all \(i\). The proof is similar for the other cases. Let \(\mathcal{B }_I,\,\Pi _\mathcal{B }(\cdot )\) and \(s_L(\cdot )\) be defined in (15)–(17), respectively. Given any \(I \subseteq \{1,\ldots ,n\}\) and \(1 \le i \le n\), let

$$\begin{aligned} h_i(L;I):=\left[ s_L\left( x^*\right) \right] ^2_i -\left[ \Pi _{\mathcal{B }}\left( s_L\left( x^*\right) \right) -s_L\left( x^*\right) \right] ^2_i -\frac{2\lambda }{L}, \end{aligned}$$

where \(x^* \in {\mathrm{Arg}}\min \{f(x): x\in \mathcal{B }_I\}\). Clearly, \(h_i(L;I)\) is well-defined since \(x^*\) is unique for each \(I\) due to the strongly convexity of \(f\). Notice that \(\alpha \) as defined in (29) can be rewritten as

$$\begin{aligned} \alpha = \min \limits _{I \subseteq \{1,\ldots ,n\}} \min \limits _{i} |h_i(L;I)|. \end{aligned}$$
(66)

As we will later show, for each \(I \subseteq \{1,\ldots ,n\},\,h_i(L;I)=0\) has at most two positive roots \(L\) for every \(i\in I\) and at most one positive root \(L\) for any \(i \notin I\). This together with (66) implies that for each \(I \subseteq \{1,\ldots ,n\}\) there exist at most

$$\begin{aligned} n_I \ :=\ 2|I|+n-|I| \ = \ n+|I| \end{aligned}$$

number of \(L\) such that \(\alpha =0\), where \(|I|\) denotes the size of \(I\). Therefore, the total number of \(L\) such that \(\alpha =0\) is at most

$$\begin{aligned} \sum \limits _{I \subseteq \{1,\ldots ,n\}} n_I = \sum ^n_{|I|=0} n_I \left( \begin{array}{c} n \\ |I| \end{array} \right) = 3n2^{n-1}. \end{aligned}$$

It remains to show that for each \(I \subseteq \{1,\ldots ,n\},\,h_i(L;I)=0\) has at most two positive roots \(L\) for every \(i\in I\) and at most one positive root \(L\) for any \(i \notin I\). We prove this result by considering all possible cases below. Let \(i\in \{1,\ldots ,n\}\) and \(x^* =\arg \min \{f(x): x\in \mathcal{B }_I\}\).

  1. (1)

    \({[\nabla f(x^*)]}_i=0\): one can see that \({[s_L(x^*)]}_i=x^*_i\). Hence, \({[\Pi _{\mathcal{B }}(s_L(x^*))]}_i= x^*_i\) and \(h_i(L;I)=(x^*_i)^2-\frac{2\lambda }{L}\). It follows that \(h_i(L;I)=0\) has at most one positive root \(L\).

  2. (2)

    \({[\nabla f(x^*)]}_i \ne 0\): by the first-order optimality condition for \(x^* = \arg \min \{f(x): \ x\in \mathcal{B }_I\}\), one can observe that if \(i \notin I\) and \({[\nabla f(x^*)]}_i < 0\), then \(x^*_i=u_i\); if \(i \notin I\) and \({[\nabla f(x^*)]}_i > 0\), then \(x^*_i=l_i\). Hence, this case can be divided into three subcases:

    1. (2a)

      \(i\in I\): it follows that \(x^*_i=0\) and \({[s_L(x^*)]}_i = -\frac{1}{L} \nabla _i f(x^*)\). Let

      $$\begin{aligned} \theta = \max \left( -\frac{\nabla _i f(x^*)}{l_i},-\frac{\nabla _i f(x^*)}{u_i}\right) \!. \end{aligned}$$

      One can observe that

      $$\begin{aligned} {[s_L(x^*)]}_i \in \left\{ \begin{array}{l@{\quad }l} {[l_i,u_i]} &{} \ \text{ if } \ L > \theta , \\ (-\infty ,l_i] &{} \ \text{ if } \ 0 < L \le \theta \ \text{ and } \ \nabla _i f(x^*)>0, \\ {[u_i, \infty )} &{} \ \text{ if } \ 0 < L \le \theta \ \text{ and } \ \nabla _i f(x^*)<0. \end{array}\right. \end{aligned}$$

      It follows that

      $$\begin{aligned} {[\Pi _\mathcal{B }(s_L(x^*))]}_i =\left\{ \begin{array}{ll} {[s_L(x^*)]}_i &{} \ \text{ if } \ L > \theta , \\ l_i &{} \ \text{ if } \ 0< L \le \theta \ \text{ and } \ \nabla _i f(x^*)>0, \\ u_i &{} \ \text{ if } \ 0< L \le \theta \ \text{ and } \ \nabla _i f(x^*)<0. \end{array}\right. \end{aligned}$$

      Using this relation and \({[s_L(x^*)]}_i = -\frac{1}{L} \nabla _i f(x^*)\), we have

      $$\begin{aligned} h_i(L;I) = \left\{ \begin{array}{ll} \frac{1}{L^2} {[\nabla _i f(x^*)]}^2 - \frac{\lambda }{2L}, &{} \text{ if } \ L > \theta , \\ -l^2_i - \frac{2l_i\nabla _i f(x^*)}{L}- \frac{\lambda }{2L}, &{}\ \text{ if } \ 0< L \le \theta \ \text{ and } \ \nabla _i f(x^*)>0, \\ -u^2_i - \frac{2u_i\nabla _i f(x^*)}{L}- \frac{\lambda }{2L}, &{}\ \text{ if } \ 0<L \le \theta \ \text{ and } \ \nabla _i f(x^*)<0. \end{array}\right. \end{aligned}$$

      Therefore, one can see that \(h_i(L;I)=0\) has at most two positive roots \(L\).

    2. (2b)

      \(i\notin I\) and \({[\nabla f(x^*)]}_i < 0\): as mentioned above, \(x^*_i=u_i\) holds for this subcase. One can observe that for all \(L>0\),

      $$\begin{aligned} {[s_L(x^*)]}_i = x^*_i - \frac{1}{L} \nabla _i f(x^*) > u_i, \end{aligned}$$

      and hence \([\Pi _\mathcal{B }(s_L(x^*))]_i=u_i\). Using this relation and \(x^*_i=u_i\), we have

      $$\begin{aligned} h_i(L;I) = u^2_i - \frac{2u_i\nabla _if(x^*)}{L} - \frac{\lambda }{2L}, \ \ \ \forall L>0. \end{aligned}$$

      Thus, \(h_i(L;I)=0\) has at most one positive root \(L\).

    3. (2c)

      \(i\notin I\) and \([\nabla f(x^*)]_i > 0\): as mentioned above, \(x^*_i=l_i\) holds for this subcase. By a similar argument as subcase (2b), one can have

      $$\begin{aligned} h_i(L;I) = l^2_i - \frac{2l_i\nabla _if(x^*)}{L} - \frac{\lambda }{2L}, \ \ \ \forall L>0. \end{aligned}$$

      Hence, \(h_i(L;I)=0\) has at most one positive root \(L\).

Combining all cases above, we obtain the result as desired.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, Z. Iterative hard thresholding methods for \(l_0\) regularized convex cone programming. Math. Program. 147, 125–154 (2014). https://doi.org/10.1007/s10107-013-0714-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-013-0714-4

Keywords

Mathematics Subject Classification

Navigation