Iterative hard thresholding methods for $$l_0$$ regularized convex cone programming

Lu, Zhaosong

doi:10.1007/s10107-013-0714-4

Iterative hard thresholding methods for $l_0$ regularized convex cone programming

Full Length Paper
Series A
Published: 26 September 2013

Volume 147, pages 125–154, (2014)
Cite this article

Mathematical Programming Submit manuscript

Zhaosong Lu¹

1416 Accesses
56 Citations
Explore all metrics

Abstract

In this paper we consider $l_0$ regularized convex cone programming problems. In particular, we first propose an iterative hard thresholding (IHT) method and its variant for solving $l_0$ regularized box constrained convex programming. We show that the sequence generated by these methods converges to a local minimizer. Also, we establish the iteration complexity of the IHT method for finding an ${{\epsilon }}$-local-optimal solution. We then propose a method for solving $l_0$ regularized convex cone programming by applying the IHT method to its quadratic penalty relaxation and establish its iteration complexity for finding an ${{\epsilon }}$-approximate local minimizer. Finally, we propose a variant of this method in which the associated penalty parameter is dynamically updated, and show that every accumulation point is a local izer of the problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

$\mathbf{C^{2}}$ -Lusin approximation of strongly convex functions

Article 03 April 2024

Random Gradient-Free Minimization of Convex Functions

Article 30 November 2015

Notes

Such a convergence result of the IHT method when applied to $l_0$-regularized and -constrained least squares problems was already established by Blumensath and Davies [5].
$\phi $ is strongly convex with modulus $\sigma >0$ if $\phi (\cdot ) - \frac{\sigma }{2}\Vert \cdot \Vert ^2$ is convex.
The existence of such $L$ is proven in the “Appendix”.

References

Bahmani, S., Raj, B., Boufounos, P.: Greedy sparsity-constrained optimization. J. Mach. Learn Res. 14, 807–841 (2013)
MathSciNet MATH Google Scholar
Barzilai, J., Borwein, J.M.: Two point step size gradient methods. IMA J. Numer. Anal. 8, 141–148 (1988)
Article MathSciNet MATH Google Scholar
Blumensath, T.: Accelerated iterative hard thresholding. Signal Process. 92(3), 752–756 (2012)
Article Google Scholar
Blumensath, T.: Compressed sensing with nonlinear observations and related nonlinear optimization problems. IEEE Trans. Inf. Theory 59(6), 3466–3474 (2013)
Article MathSciNet Google Scholar
Blumensath, T., Davies, M.E.: Iterative thresholding for sparse approximations. J. Fourier Anal. Appl. 14, 629–654 (2008)
Article MathSciNet MATH Google Scholar
Blumensath, T., Davies, M.E.: Iterative hard thresholding for compressed sensing. Appl. Comput. Harmon. Anal. 27(3), 265–274 (2009)
Article MathSciNet MATH Google Scholar
Blumensath, T., Davies, M.E.: Normalized iterative hard thresholding: guaranteed stability and performance. IEEE J. Sel. Top. Signal Process. 4(2), 298–309 (2010)
Article Google Scholar
Candès, E., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)
Article MATH Google Scholar
Cevher, V.: On accelerated hard thresholding methods for sparse approximation. In: Proceedings of SPIE, Conference on Wavelets and Sparsity XIV. San Diego, CA (2011)
Dempster, A.: Covariance selection. Biometrics 28, 157–175 (1978)
Article Google Scholar
Donoho, D.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)
Article MathSciNet MATH Google Scholar
Foucart, S.: Hard thresholding pursuit: an algorithm for compressive sensing. SIAM J. Numer. Anal. 49(6), 2543–2563 (2011)
Article MathSciNet MATH Google Scholar
Herrity, K.K., Gilbert, A.C., Tropp, J.A.: Sparse approximation via iterative thresholding. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2006)
Jain, P., Meka, R., Dhillon, I.: Guaranteed rank minimization via singular value projection. Neural Inf. Process. Syst., 937–945 (2010)
Lan, G., Monteiro, R.D.C.: Iteration-complexity of first-order penalty methods for convex programming. Math. Progam. 138, 115–139 (2013)
Article MathSciNet MATH Google Scholar
Lu, Z., Zhang, Y.: Sparse approximation via penalty decomposition methods. To appear in SIAM J. Optim. (2013)
Maleki, A.: Coherence analysis of iterative thresholding algorithms. In: Proceedings of 47th Annual Allerton Conference on Communication, Control, and Computing, pp. 236–243 (2009)
Mallat, S., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Image Process. 41(12), 3397–3415 (1993)
Article MATH Google Scholar
Nesterov, Y.: Introductory Lectures on Convex Programming: A Basic Course. Kluwer, Boston, MA (2004)
Book Google Scholar
Ng, A.Y.: Feature selection, $l_1$ vs. $l_2$ regularization, and rotational invariance. In: Proceedings of the Twenty-First International Conference on Machine learning (ICML), pp. 72–85 (2004)
Nikolova, M.: Description of the minimizers of least squares regularized with $l_0$ norm. Report HAL-00723812, CMLA-CNRS ENS Cachan, France (2011)
Tanner, J., Wei, K.: Normalized iterative hard thresholding for matrix completion. Accepted by SIAM J. Sci. Comput. (2012)
Tropp, J.A.: Greed is good: algorithmic results for sparse approximation. IEEE Trans. Inf. Theory 50(10), 2231–2242 (2004)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The author is grateful to Ting Kei Pong for proofreading and suggestions. He would also like to thank the anonymous referees for pointing out many relevant references and for suggestions to improve the presentation.

Author information

Authors and Affiliations

Department of Mathematics, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada
Zhaosong Lu

Authors

Zhaosong Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhaosong Lu.

Additional information

This work was supported in part by NSERC Discovery Grant. Part of this work was conducted during the author’s sabbatical leave in Department of Industrial and Systems Engineering at Texas A&M University. The author would like to thank them for hosting his visit.

Appendix

In this appendix we show that if $f$ is smooth strongly convex, there exists $L>L_f$ such that $\alpha >0$, which is required in Theorem 3.5. In particular, we will show that there exist at most $3n2^{n-1}$ number of $L$ such that $\alpha =0$; in other words, $\alpha >0$ for almost all $L>0$ except at most $3n2^{n-1}$ number of points. Therefore, $\alpha >0$ is almost sure for a randomly chosen $L$ and an arbitrary $L>L_f$ may be sufficient for practical implementation of the IHT method for solving problem (12).

To this aim, assume that $f$ is smooth strongly convex, which is imposed in Theorem 3.5, and let $\alpha $ be defined in (29). We now show that $\alpha >0$ for almost all $L>0$ except at most $3n2^{n-1}$ values. For simplicity, we only consider the case where $-\infty <l_i<0$ and $0<u_i<\infty $ for all $i$. The proof is similar for the other cases. Let $\mathcal{B }_I,\,\Pi _\mathcal{B }(\cdot )$ and $s_L(\cdot )$ be defined in (15)–(17), respectively. Given any $I \subseteq \{1,\ldots ,n\}$ and $1 \le i \le n$, let

$$\begin{aligned} h_i(L;I):=\left[ s_L\left( x^*\right) \right] ^2_i -\left[ \Pi _{\mathcal{B }}\left( s_L\left( x^*\right) \right) -s_L\left( x^*\right) \right] ^2_i -\frac{2\lambda }{L}, \end{aligned}$$

where $x^* \in {\mathrm{Arg}}\min \{f(x): x\in \mathcal{B }_I\}$. Clearly, $h_i(L;I)$ is well-defined since $x^*$ is unique for each $I$ due to the strongly convexity of $f$. Notice that $\alpha $ as defined in (29) can be rewritten as

$$\begin{aligned} \alpha = \min \limits _{I \subseteq \{1,\ldots ,n\}} \min \limits _{i} |h_i(L;I)|. \end{aligned}$$

(66)

As we will later show, for each $I \subseteq \{1,\ldots ,n\},\,h_i(L;I)=0$ has at most two positive roots $L$ for every $i\in I$ and at most one positive root $L$ for any $i \notin I$. This together with (66) implies that for each $I \subseteq \{1,\ldots ,n\}$ there exist at most

$$\begin{aligned} n_I \ :=\ 2|I|+n-|I| \ = \ n+|I| \end{aligned}$$

number of $L$ such that $\alpha =0$, where $|I|$ denotes the size of $I$. Therefore, the total number of $L$ such that $\alpha =0$ is at most

$$\begin{aligned} \sum \limits _{I \subseteq \{1,\ldots ,n\}} n_I = \sum ^n_{|I|=0} n_I \left( \begin{array}{c} n \\ |I| \end{array} \right) = 3n2^{n-1}. \end{aligned}$$

It remains to show that for each $I \subseteq \{1,\ldots ,n\},\,h_i(L;I)=0$ has at most two positive roots $L$ for every $i\in I$ and at most one positive root $L$ for any $i \notin I$. We prove this result by considering all possible cases below. Let $i\in \{1,\ldots ,n\}$ and $x^* =\arg \min \{f(x): x\in \mathcal{B }_I\}$.

(1)
${[\nabla f(x^*)]}_i=0$: one can see that ${[s_L(x^*)]}_i=x^*_i$. Hence, ${[\Pi _{\mathcal{B }}(s_L(x^*))]}_i= x^*_i$ and $h_i(L;I)=(x^*_i)^2-\frac{2\lambda }{L}$. It follows that $h_i(L;I)=0$ has at most one positive root $L$.
(2)
${[\nabla f(x^*)]}_i \ne 0$: by the first-order optimality condition for $x^* = \arg \min \{f(x): \ x\in \mathcal{B }_I\}$, one can observe that if $i \notin I$ and ${[\nabla f(x^*)]}_i < 0$, then $x^*_i=u_i$; if $i \notin I$ and ${[\nabla f(x^*)]}_i > 0$, then $x^*_i=l_i$. Hence, this case can be divided into three subcases:
1. (2a)
  $i\in I$: it follows that $x^*_i=0$ and ${[s_L(x^*)]}_i = -\frac{1}{L} \nabla _i f(x^*)$. Let
  $$\begin{aligned} \theta = \max \left( -\frac{\nabla _i f(x^*)}{l_i},-\frac{\nabla _i f(x^*)}{u_i}\right) \!. \end{aligned}$$
  One can observe that
  $$\begin{aligned} {[s_L(x^*)]}_i \in \left\{ \begin{array}{l@{\quad }l} {[l_i,u_i]} &{} \ \text{ if } \ L > \theta , \\ (-\infty ,l_i] &{} \ \text{ if } \ 0 < L \le \theta \ \text{ and } \ \nabla _i f(x^*)>0, \\ {[u_i, \infty )} &{} \ \text{ if } \ 0 < L \le \theta \ \text{ and } \ \nabla _i f(x^*)<0. \end{array}\right. \end{aligned}$$
  It follows that
  $$\begin{aligned} {[\Pi _\mathcal{B }(s_L(x^*))]}_i =\left\{ \begin{array}{ll} {[s_L(x^*)]}_i &{} \ \text{ if } \ L > \theta , \\ l_i &{} \ \text{ if } \ 0< L \le \theta \ \text{ and } \ \nabla _i f(x^*)>0, \\ u_i &{} \ \text{ if } \ 0< L \le \theta \ \text{ and } \ \nabla _i f(x^*)<0. \end{array}\right. \end{aligned}$$
  Using this relation and ${[s_L(x^*)]}_i = -\frac{1}{L} \nabla _i f(x^*)$, we have
  $$\begin{aligned} h_i(L;I) = \left\{ \begin{array}{ll} \frac{1}{L^2} {[\nabla _i f(x^*)]}^2 - \frac{\lambda }{2L}, &{} \text{ if } \ L > \theta , \\ -l^2_i - \frac{2l_i\nabla _i f(x^*)}{L}- \frac{\lambda }{2L}, &{}\ \text{ if } \ 0< L \le \theta \ \text{ and } \ \nabla _i f(x^*)>0, \\ -u^2_i - \frac{2u_i\nabla _i f(x^*)}{L}- \frac{\lambda }{2L}, &{}\ \text{ if } \ 0<L \le \theta \ \text{ and } \ \nabla _i f(x^*)<0. \end{array}\right. \end{aligned}$$
  Therefore, one can see that $h_i(L;I)=0$ has at most two positive roots $L$.
2. (2b)
  $i\notin I$ and ${[\nabla f(x^*)]}_i < 0$: as mentioned above, $x^*_i=u_i$ holds for this subcase. One can observe that for all $L>0$,
  $$\begin{aligned} {[s_L(x^*)]}_i = x^*_i - \frac{1}{L} \nabla _i f(x^*) > u_i, \end{aligned}$$
  and hence $[\Pi _\mathcal{B }(s_L(x^*))]_i=u_i$. Using this relation and $x^*_i=u_i$, we have
  $$\begin{aligned} h_i(L;I) = u^2_i - \frac{2u_i\nabla _if(x^*)}{L} - \frac{\lambda }{2L}, \ \ \ \forall L>0. \end{aligned}$$
  Thus, $h_i(L;I)=0$ has at most one positive root $L$.
3. (2c)
  $i\notin I$ and $[\nabla f(x^*)]_i > 0$: as mentioned above, $x^*_i=l_i$ holds for this subcase. By a similar argument as subcase (2b), one can have
  $$\begin{aligned} h_i(L;I) = l^2_i - \frac{2l_i\nabla _if(x^*)}{L} - \frac{\lambda }{2L}, \ \ \ \forall L>0. \end{aligned}$$
  Hence, $h_i(L;I)=0$ has at most one positive root $L$.

Combining all cases above, we obtain the result as desired.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, Z. Iterative hard thresholding methods for $l_0$ regularized convex cone programming. Math. Program. 147, 125–154 (2014). https://doi.org/10.1007/s10107-013-0714-4

Download citation

Received: 30 October 2012
Accepted: 16 September 2013
Published: 26 September 2013
Issue Date: October 2014
DOI: https://doi.org/10.1007/s10107-013-0714-4

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Iterative hard thresholding methods for \(l_0\) regularized convex cone programming

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

$\mathbf{C^{2}}$ -Lusin approximation of strongly convex functions

Random Gradient-Free Minimization of Convex Functions

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Iterative hard thresholding methods for \(l_0\) regularized convex cone programming

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

$\mathbf{C^{2}}$ -Lusin approximation of strongly convex functions

Random Gradient-Free Minimization of Convex Functions

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation