Elsevier

Signal Processing

Volume 91, Issue 7, July 2011, Pages 1505-1526
Signal Processing

Review
Surveying and comparing simultaneous sparse approximation (or group-lasso) algorithms

https://doi.org/10.1016/j.sigpro.2011.01.012Get rights and content

Abstract

In this paper, we survey and compare different algorithms that, given an overcomplete dictionary of elementary functions, solve the problem of simultaneous sparse signal approximation, with common sparsity profile induced by a pq mixed-norm. Such a problem is also known in the statistical learning community as the group lasso problem. We have gathered and detailed different algorithmic results concerning these two equivalent approximation problems. We have also enriched the discussion by providing relations between several algorithms. Experimental comparisons of the detailed algorithms have also been carried out. The main lesson learned from these experiments is that depending on the performance measure, greedy approaches and iterative reweighted algorithms are the most efficient algorithms either in term of computational complexities, sparsity recovery or mean-square error.

Introduction

Since several years now, there has been a lot of interest about sparse signal approximation. This large interest comes from frequent wishes of practitioners to represent data in the most parsimonious way.

Recently, researchers have focused their efforts on a natural extension of sparse approximation problem which is the problem of finding jointly sparse representations of multiple signal vectors. This problem is also known as simultaneous sparse approximation and it can be stated as follows. Suppose we have several signals describing the same phenomenon, and each signal is contaminated by noise. We want to find the sparsest approximation of each signal by using the same set of elementary functions. Hence, the problem consists in finding the best approximation of each signal while controlling the number of functions involved in all the approximations.

Such a situation arises in many different application domains such as sensor networks signal processing [30], [11], neuroelectromagnetic imaging [22], [37], [59], source localization [31], image restoration [17], distributed compressed sensing [27] and signal and image processing [24], [49].

Formally, the problem of simultaneous sparse approximation is the following. Suppose that we have measured L signals {si}i=1L where each signal is of the form si=Φci+ε where siRN, ΦRN×M is a matrix of unit-norm elementary functions, ciRM is a weighting vector and ε is a noise vector. Φ will be denoted in the sequel as the dictionary matrix. Since we have several signals, the overall measurements can be written asS=ΦC+Ewith S=[s1s2sL] is a signal matrix, C=[c1c2cL] a coefficient matrix and E a noise matrix. Note that in the sequel, we have adopted the following notations: ci,· and c·,j respectively denote the i-th row and j-th column of matrix C and ci,j is the i-th element in the j-th column of C.

For the simultaneous sparse approximation (SSA) problem, the goal is then to recover the matrix C given the signal matrix S and the dictionary Φ under the hypothesis that all signals si share the same sparsity profile. This latter hypothesis can also be translated into the coefficient matrix C having a minimal number of non-zero rows. In order to measure the number of non-zero rows of C, a possible criterion is the so-called row-support or row-diversity measure of a coefficient matrix defined as rowsupp(C)={i[1M]:ci,k0for somek}The row-support of C tells us which atoms of the dictionary have been used for building the signal matrix. Hence, if the cardinality of the row-support is lower than the dictionary cardinality, it means that at least one atom of the dictionary has not been used for synthesizing the signal matrix. Then, the row-0 pseudo-norm of a coefficient matrix can be defined as: Crow0=|rowsupp(C)|. According to this definition, the simultaneous sparse approximation problem can be stated asminC12SΦCF2s.t.Crow0Twhere ·F is the Frobenius norm and T a user-defined parameter that controls the sparsity of the solution. Note that the problem can also take the different form:minCCrow0s.t.12SΦCFεFor this latter formulation, the problem translates in minimizing the number of non-zero rows in the coefficient matrix C while keeping control on the approximation error. Both problems (2) and (3) are appealing for their formulation clarity. However, similarly to the single signal approximation case, solving these optimization problems are notably intractable because ·row0 is a discrete-valued function. Two ways of addressing these intractable problems (2) and (3) are possible: relaxing the problem by replacing the ·row0 function with a more tractable row-diversity measure or by using some suboptimal algorithms.

A large class of relaxed versions of ·row0 proposed in the literature are encompassed into the following form [52]: Jp,q(C)=ici,·qpwithci,·q=j|ci,j|q1/qwhere typically p1 and q1. This penalty term can be interpreted as the p quasi-norm of the sequence {ci,·q}i to the power p. According to this relaxed version of the row-diversity measure, most of the algorithms proposed in the literature try to solve the relaxed problem:minC12SΦCF2+λJp,q(C)where λ is another user-defined parameter that balances the approximation error and the sparsity-inducing penalty Jp,q(C). The choice of p and q results in a compromise between the row-support sparsity and the convexity of the optimization problem. Indeed, problem (4) is known to be convex when p,q1 while it is known to produce a row-sparse matrix C if p1 (due to the penalty function singularity at C=0 [15]).

The simultaneous sparse approximation problem as described here is equivalent to several other problems studied in other research communities. Problem (4) can be reformulated so as to make clear its relation with some other problems denoted as the pq group lasso [61] or block-sparse regression [28] in statistics or block-sparse signal recovery [13] in signal processing. Indeed, let us define s˜=s1sLΦ˜=Φ(0)(0)Φc˜=c1cLthen, problem (4) can be equivalently rewritten asminc˜12s˜Φ˜c˜22+λJp,q(c˜)where Jp,q(c˜)=i=1Mc˜giqp with gi being all the indices in c˜ related to the i-th element of the dictionary matrix Φ.

As we have already stated, simultaneous sparse approximation and equivalent problems have been investigated by diverse communities and have also been applied to various application problems. Since the literature on these topics has been overwhelming since the last few years. The aim of this work is to gather results from different communities (statistics, signal processing and machine learning) so as to survey, analyze and compare different proposed algorithms for solving optimization problem (4) for different values of p and q. In particular, instead of merely summarizing existing results, we enrich our survey by providing results like proof of convergence, formal relations between different algorithms and experimental comparisons that were not available in the literature. These experimental comparisons essentially focus on the computational complexity of the different methods, on their ability of recovering the signal sparsity pattern and on the quality of approximation they provided evaluated through mean-square error.

Note that recently, there has been various works which addressed the simultaneous sparse approximation in the noiseless case [47], [45] and many works which aim at providing theoretical approximation guarantees in both noiseless and noisy cases [6], [13], [33], [14], [25], [29]. Surveying these works is beyond the scope of this paper and we suggest interested readers to follow for instance these pointers and references therein.

The most frequent case of problem (4) encountered in the literature is the one where p=1 and q=2. This case is the simpler one since it leads to a convex problem. We discuss different algorithms for solving this problem in Section 2. Then, we survey in Section 3 generic algorithms that are able to solve our approximation problems with different values of p and q. These algorithms are essentially iterative reweighted 1 or 2 algorithms. Many works in the literature have focused on one algorithm for solving a particular case of problem (4). These specific algorithms are discussed in Section 4. Notably, we present some greedy algorithms and discuss a sparse Bayesian learning algorithm. The experimental comparisons we carried out aim at clarifying how each algorithm behaves in terms of computational complexity, in sparsity recovery and in approximation mean-square error. These results are described in Section 5. For a sake of reproducibility, the code used in this paper has been made freely available.1 Final discussions and conclusions are given in Section 6.

Section snippets

Solving the 12 optimization problem

When p=1, q=2, optimization problem (4) becomes a particular problem named as M-BP for Multiple Basis Pursuit in the sequel. It is a special case that deserves attention. Indeed, it seems to be the most natural extension of the so-called Lasso problem [50] or Basis Pursuit Denoising [7], since for L=1, it can be easily shown that problem (4) reduced to these two problems. The key point of this case is that it yields to a convex optimization problem and thus it can benefit from all properties

Generic algorithms for large classes of p and q

In this section, we present several algorithms that are able to solve the simultaneous sparse approximation problem for a large classes of p and q. In the first part of the section, we review the M-FOCUSS algorithm of Cotter et al. [8] that addresses the case where 0<p1 and q=2. In a second part, we make clear how the penalization term Jp,q(C) with p=1 and 1q2, is related to automatic relevance determination. From this novel insight, we then propose an iterative reweighted least-square

Specific case algorithms

In this section, we survey some algorithms that provide solutions of the SSA problem (4) when p=0. These algorithms are different in nature and provide different qualities of approximation. The first two algorithms we detail are greedy algorithms which have the flavor of Matching Pursuit [32]. The third one is based on a sparse Bayesian learning and is related to iterative reweighted 1 algorithm [60].

Numerical experiments

Some computer simulations have been carried out in order to evaluate the algorithms described in the above sections. Results that have been obtained from these numerical studies are detailed in this section.

Conclusions

This paper aimed at surveying and comparing different simultaneous sparse signal approximation algorithms available in the statistical learning and signal processing communities.

These algorithms usually solve a relaxed version of an intractable problem where the relaxation is based on sparsity-inducing mixed-norm on the coefficient approximation. In the first part of the paper, we have described several algorithms which addresses the most common SSA problem which is a convex problem with an 1

Acknowledgments

We would like to thank the anonymous reviewers for their comments. This work was supported in part by the IST Program of the European Community, under the PASCAL2 Network of Excellence, IST-216886. This publication only reflects the authors views. This work was partly supported by grants from the ANR ClasSel 08-EMER and ASAP ANR-09-EMER-001.

References (64)

  • D. Wipf et al.

    Robust Bayesian estimation of the location orientation and time course of multiple correlated neural sources using meg

    NeuroImage

    (2010)
  • R.G. Baraniuk et al.

    Model-based compressive sensing

    IEEE Transactions on Information Theory

    (2010)
  • D. Bertsekas et al.

    Convex Analysis and Optimization

    (2003)
  • S. Boyd et al.

    Convex Optimization

    (2004)
  • E. Candès et al.

    Enhancing sparsity by reweighted 1 minimization

    Journal of Fourier Analysis and Applications

    (2008)
  • R. Chartrand, W. Yin, Iteratively reweighted algorithms for compressive sensing, in: 33rd International Conference on...
  • J. Chen et al.

    Sparse representations for multiple measurements vectors (MMV) in an overcomplete dictionary

  • S. Chen et al.

    Atomic decomposition by basis pursuit

    SIAM Journal of Scientific Computing

    (1999)
  • S. Cotter et al.

    Sparse solutions to linear inverse problems with multiple measurement vectors

    IEEE Transactions on Signal Processing

    (2005)
  • I. Daubechies et al.

    An iterative thresholding algorithm for linear inverse problems with a sparsity constraint

    Communication on Pure and Applied Mathematics

    (2004)
  • I. Daubechies et al.

    Iteratively reweighted least squares minimization for sparse recovery

    Communication on Pure and Applied Mathematics

    (2010)
  • M. Duarte, V. Cevhler, R. Baraniuk, Model-based compressive sensing for signal ensembles, in: Proceeding of the 47th...
  • M. Elad

    Why simple shrinkage is still relevant for redundant representations?

    IEEE Transactions on Information Theory

    (2006)
  • Y. Eldar et al.

    Compressed sensing of block-sparse signals: uncertainty relations and efficient recovery

    IEEE Transactions on Signal Processing

    (2010)
  • Y. Eldar et al.

    Robust recovery of signals from a structured union of subspaces

    IEEE Transactions on Information Theory

    (2009)
  • J. Fan et al.

    Variable selection via nonconcave penalized likelihood and its oracle properties

    Journal of the American Statistical Association

    (2001)
  • M. Figueiredo, J. Bioucas-Dias, R., N., Majorization-minimization algorithms for wavelet-based image for restoration....
  • M. Fornasier et al.

    Recovery algorithms for vector valued data with joint sparsity constraints

    SIAM Journal of Numerical Analysis

    (2008)
  • J. Friedman et al.

    Pathwise coordinate optimization

    The Annals of Applied Statistics

    (2007)
  • G. Gasso et al.

    Recovering sparse signals with a certain family of non-convex penalties and dc programming

    IEEE Transactions on Signal Processing

    (2009)
  • M. Girolami et al.

    Hierarchic Bayesian models for kernel learning

  • Y. Grandvalet

    Least absolute shrinkage is equivalent to quadratic penalization

  • Cited by (0)

    View full text