Surveying and comparing simultaneous sparse approximation (or group-lasso) algorithms

doi:10.1016/j.sigpro.2011.01.012

Signal Processing

Volume 91, Issue 7, July 2011, Pages 1505-1526

https://doi.org/10.1016/j.sigpro.2011.01.012 Get rights and content

Abstract

In this paper, we survey and compare different algorithms that, given an overcomplete dictionary of elementary functions, solve the problem of simultaneous sparse signal approximation, with common sparsity profile induced by a $ℓ_{p} - ℓ_{q}$ mixed-norm. Such a problem is also known in the statistical learning community as the group lasso problem. We have gathered and detailed different algorithmic results concerning these two equivalent approximation problems. We have also enriched the discussion by providing relations between several algorithms. Experimental comparisons of the detailed algorithms have also been carried out. The main lesson learned from these experiments is that depending on the performance measure, greedy approaches and iterative reweighted algorithms are the most efficient algorithms either in term of computational complexities, sparsity recovery or mean-square error.

Introduction

Since several years now, there has been a lot of interest about sparse signal approximation. This large interest comes from frequent wishes of practitioners to represent data in the most parsimonious way.

Recently, researchers have focused their efforts on a natural extension of sparse approximation problem which is the problem of finding jointly sparse representations of multiple signal vectors. This problem is also known as simultaneous sparse approximation and it can be stated as follows. Suppose we have several signals describing the same phenomenon, and each signal is contaminated by noise. We want to find the sparsest approximation of each signal by using the same set of elementary functions. Hence, the problem consists in finding the best approximation of each signal while controlling the number of functions involved in all the approximations.

Such a situation arises in many different application domains such as sensor networks signal processing [30], [11], neuroelectromagnetic imaging [22], [37], [59], source localization [31], image restoration [17], distributed compressed sensing [27] and signal and image processing [24], [49].

Formally, the problem of simultaneous sparse approximation is the following. Suppose that we have measured L signals ${s_{i}}_{i = 1}^{L}$ where each signal is of the form $s_{i} = Φ c_{i} + ε$ where $s_{i} \in R^{N}$ , $Φ \in R^{N \times M}$ is a matrix of unit-norm elementary functions, $c_{i} \in R^{M}$ is a weighting vector and $ε$ is a noise vector. $Φ$ will be denoted in the sequel as the dictionary matrix. Since we have several signals, the overall measurements can be written as $S = Φ C + E$ with $S = [s_{1} s_{2} \dots s_{L}]$ is a signal matrix, $C = [c_{1} c_{2} \dots c_{L}]$ a coefficient matrix and $E$ a noise matrix. Note that in the sequel, we have adopted the following notations: $c_{i, \cdot}$ and $c_{\cdot, j}$ respectively denote the i-th row and j-th column of matrix $C$ and $c_{i, j}$ is the i-th element in the j-th column of $C$ .

For the simultaneous sparse approximation (SSA) problem, the goal is then to recover the matrix $C$ given the signal matrix $S$ and the dictionary $Φ$ under the hypothesis that all signals $s_{i}$ share the same sparsity profile. This latter hypothesis can also be translated into the coefficient matrix $C$ having a minimal number of non-zero rows. In order to measure the number of non-zero rows of $C$ , a possible criterion is the so-called row-support or row-diversity measure of a coefficient matrix defined as $rowsupp (C) = {i \in [1 \dots M] : c_{i, k} \neq 0 for some k}$ The row-support of $C$ tells us which atoms of the dictionary have been used for building the signal matrix. Hence, if the cardinality of the row-support is lower than the dictionary cardinality, it means that at least one atom of the dictionary has not been used for synthesizing the signal matrix. Then, the row- $ℓ_{0}$ pseudo-norm of a coefficient matrix can be defined as: $∥ C ∥_{row ‐ 0} = | rowsupp (C) |$ . According to this definition, the simultaneous sparse approximation problem can be stated as $\min_{C} \frac{1}{2} ∥ S - Φ C ∥_{F}^{2} s . t . ∥ C ∥_{row ‐ 0} \leq T$ where $∥ \cdot ∥_{F}$ is the Frobenius norm and T a user-defined parameter that controls the sparsity of the solution. Note that the problem can also take the different form: $\min_{C} ∥ C ∥_{row ‐ 0} s . t . \frac{1}{2} ∥ S - Φ C ∥_{F} \leq ε$ For this latter formulation, the problem translates in minimizing the number of non-zero rows in the coefficient matrix $C$ while keeping control on the approximation error. Both problems (2) and (3) are appealing for their formulation clarity. However, similarly to the single signal approximation case, solving these optimization problems are notably intractable because $∥ \cdot ∥_{row ‐ 0}$ is a discrete-valued function. Two ways of addressing these intractable problems (2) and (3) are possible: relaxing the problem by replacing the $∥ \cdot ∥_{row ‐ 0}$ function with a more tractable row-diversity measure or by using some suboptimal algorithms.

A large class of relaxed versions of $∥ \cdot ∥_{row ‐ 0}$ proposed in the literature are encompassed into the following form [52]: $J_{p, q} (C) = \sum_{i} ∥ c_{i, \cdot} ∥_{q}^{p} with ∥ c_{i, \cdot} ∥_{q} = {(\sum_{j} | c_{i, j} |^{q})}^{1 / q}$ where typically $p \leq 1$ and $q \geq 1$ . This penalty term can be interpreted as the $ℓ_{p}$ quasi-norm of the sequence ${∥ c_{i, \cdot} ∥_{q}}_{i}$ to the power p. According to this relaxed version of the row-diversity measure, most of the algorithms proposed in the literature try to solve the relaxed problem: $\min_{C} \frac{1}{2} ∥ S - Φ C ∥_{F}^{2} + λ J_{p, q} (C)$ where $λ$ is another user-defined parameter that balances the approximation error and the sparsity-inducing penalty $J_{p, q} (C)$ . The choice of p and q results in a compromise between the row-support sparsity and the convexity of the optimization problem. Indeed, problem (4) is known to be convex when $p, q \geq 1$ while it is known to produce a row-sparse matrix $C$ if $p \leq 1$ (due to the penalty function singularity at $C = 0$ [15]).

The simultaneous sparse approximation problem as described here is equivalent to several other problems studied in other research communities. Problem (4) can be reformulated so as to make clear its relation with some other problems denoted as the $ℓ_{p} - ℓ_{q}$ group lasso [61] or block-sparse regression [28] in statistics or block-sparse signal recovery [13] in signal processing. Indeed, let us define $\tilde{s} = (\begin{matrix} s_{1} \\ ⋮ \\ s_{L} \end{matrix}) \tilde{Φ} = (\begin{matrix} Φ & (0) \\ ⋱ \\ (0) & Φ \end{matrix}) \tilde{c} = (\begin{matrix} c_{1} \\ ⋮ \\ c_{L} \end{matrix})$ then, problem (4) can be equivalently rewritten as $\min_{\tilde{c}} \frac{1}{2} ∥ \tilde{s} - \tilde{Φ} \tilde{c} ∥_{2}^{2} + λ J'_{p, q} (\tilde{c})$ where $J'_{p, q} (\tilde{c}) = \sum_{i = 1}^{M} ∥ {\tilde{c}}_{g_{i}} ∥_{q}^{p}$ with $g_{i}$ being all the indices in $\tilde{c}$ related to the i-th element of the dictionary matrix $Φ$ .

As we have already stated, simultaneous sparse approximation and equivalent problems have been investigated by diverse communities and have also been applied to various application problems. Since the literature on these topics has been overwhelming since the last few years. The aim of this work is to gather results from different communities (statistics, signal processing and machine learning) so as to survey, analyze and compare different proposed algorithms for solving optimization problem (4) for different values of p and q. In particular, instead of merely summarizing existing results, we enrich our survey by providing results like proof of convergence, formal relations between different algorithms and experimental comparisons that were not available in the literature. These experimental comparisons essentially focus on the computational complexity of the different methods, on their ability of recovering the signal sparsity pattern and on the quality of approximation they provided evaluated through mean-square error.

Note that recently, there has been various works which addressed the simultaneous sparse approximation in the noiseless case [47], [45] and many works which aim at providing theoretical approximation guarantees in both noiseless and noisy cases [6], [13], [33], [14], [25], [29]. Surveying these works is beyond the scope of this paper and we suggest interested readers to follow for instance these pointers and references therein.

The most frequent case of problem (4) encountered in the literature is the one where $p = 1$ and $q = 2$ . This case is the simpler one since it leads to a convex problem. We discuss different algorithms for solving this problem in Section 2. Then, we survey in Section 3 generic algorithms that are able to solve our approximation problems with different values of p and q. These algorithms are essentially iterative reweighted $ℓ_{1}$ or $ℓ_{2}$ algorithms. Many works in the literature have focused on one algorithm for solving a particular case of problem (4). These specific algorithms are discussed in Section 4. Notably, we present some greedy algorithms and discuss a sparse Bayesian learning algorithm. The experimental comparisons we carried out aim at clarifying how each algorithm behaves in terms of computational complexity, in sparsity recovery and in approximation mean-square error. These results are described in Section 5. For a sake of reproducibility, the code used in this paper has been made freely available.¹ Final discussions and conclusions are given in Section 6.

Section snippets

Solving the $ℓ_{1} - ℓ_{2}$ optimization problem

When p=1, q=2, optimization problem (4) becomes a particular problem named as M-BP for Multiple Basis Pursuit in the sequel. It is a special case that deserves attention. Indeed, it seems to be the most natural extension of the so-called Lasso problem [50] or Basis Pursuit Denoising [7], since for L=1, it can be easily shown that problem (4) reduced to these two problems. The key point of this case is that it yields to a convex optimization problem and thus it can benefit from all properties

Generic algorithms for large classes of p and q

In this section, we present several algorithms that are able to solve the simultaneous sparse approximation problem for a large classes of p and q. In the first part of the section, we review the M-FOCUSS algorithm of Cotter et al. [8] that addresses the case where $0 < p \leq 1$ and q=2. In a second part, we make clear how the penalization term $J_{p, q} (C)$ with p=1 and $1 \leq q \leq 2$ , is related to automatic relevance determination. From this novel insight, we then propose an iterative reweighted least-square

Specific case algorithms

In this section, we survey some algorithms that provide solutions of the SSA problem (4) when p=0. These algorithms are different in nature and provide different qualities of approximation. The first two algorithms we detail are greedy algorithms which have the flavor of Matching Pursuit [32]. The third one is based on a sparse Bayesian learning and is related to iterative reweighted $ℓ_{1}$ algorithm [60].

Numerical experiments

Some computer simulations have been carried out in order to evaluate the algorithms described in the above sections. Results that have been obtained from these numerical studies are detailed in this section.

Conclusions

This paper aimed at surveying and comparing different simultaneous sparse signal approximation algorithms available in the statistical learning and signal processing communities.

These algorithms usually solve a relaxed version of an intractable problem where the relaxation is based on sparsity-inducing mixed-norm on the coefficient approximation. In the first part of the paper, we have described several algorithms which addresses the most common SSA problem which is a convex problem with an $ℓ_{1} - ℓ$

Acknowledgments

We would like to thank the anonymous reviewers for their comments. This work was supported in part by the IST Program of the European Community, under the PASCAL2 Network of Excellence, IST-216886. This publication only reflects the authors views. This work was partly supported by grants from the ANR ClasSel 08-EMER and ASAP ANR-09-EMER-001.

References (64)

S. Foucart et al.
Sparsest solutions of underdetermined linear systems via $ℓ_{q} -minimization$ for $0 < q \leq 1$
Applied and Computational Harmonic Analysis
(2009)
I. Gorodnitsky et al.
Neuromagnetic source imaging with FOCUSS: a recursive weighted minimum norm algorithm
Journal of Electroencephalography and Clinical Neurophysiology
(1995)
R. Gribonval et al.
Sparse approximations in signal and image processing
Signal Processing
(2006)
Z. Luo et al.
Distributed signal processing in sensor networks
IEEE Signal Processing Magazine
(2006)
D. Needell et al.
CosAmp: iterative signal recovery from incomplete and inaccurate samples
Applied and Computational Harmonic Analysis
(2009)
C. Phillips et al.
An empirical Bayesian solution to the source reconstruction problem in EEG
NeuroImage
(2005)
T. Simila et al.
Input selection and shrinkage in multiresponse linear regression
Computational Statistics and Data analysis
(2007)
F. Theis et al.
On the use of sparse signal decomposition in the analysis of multi-channel surface electromyograms
Signal Processing
(2006)
J. Tropp
Algorithms for simultaneous sparse approximation. Part II: convex relaxation
Signal Processing
(2006)
J. Tropp et al.
Algorithms for simultaneous sparse approximation. Part I: greedy pursuit
Signal Processing
(2006)

D. Wipf et al.

Robust Bayesian estimation of the location orientation and time course of multiple correlated neural sources using meg

NeuroImage

(2010)

R.G. Baraniuk et al.

Model-based compressive sensing

IEEE Transactions on Information Theory

(2010)

D. Bertsekas et al.

Convex Analysis and Optimization

(2003)

S. Boyd et al.

Convex Optimization

(2004)

E. Candès et al.

Enhancing sparsity by reweighted $ℓ_{1}$ minimization

Journal of Fourier Analysis and Applications

(2008)

R. Chartrand, W. Yin, Iteratively reweighted algorithms for compressive sensing, in: 33rd International Conference on...

J. Chen et al.

Sparse representations for multiple measurements vectors (MMV) in an overcomplete dictionary

S. Chen et al.

Atomic decomposition by basis pursuit

SIAM Journal of Scientific Computing

(1999)

S. Cotter et al.

Sparse solutions to linear inverse problems with multiple measurement vectors

IEEE Transactions on Signal Processing

(2005)

I. Daubechies et al.

An iterative thresholding algorithm for linear inverse problems with a sparsity constraint

Communication on Pure and Applied Mathematics

(2004)

I. Daubechies et al.

Iteratively reweighted least squares minimization for sparse recovery

Communication on Pure and Applied Mathematics

(2010)

M. Duarte, V. Cevhler, R. Baraniuk, Model-based compressive sensing for signal ensembles, in: Proceeding of the 47th...

M. Elad

Why simple shrinkage is still relevant for redundant representations?

IEEE Transactions on Information Theory

(2006)

Y. Eldar et al.

Compressed sensing of block-sparse signals: uncertainty relations and efficient recovery

IEEE Transactions on Signal Processing

(2010)

Y. Eldar et al.

Robust recovery of signals from a structured union of subspaces

IEEE Transactions on Information Theory

(2009)

J. Fan et al.

Variable selection via nonconcave penalized likelihood and its oracle properties

Journal of the American Statistical Association

(2001)

M. Figueiredo, J. Bioucas-Dias, R., N., Majorization-minimization algorithms for wavelet-based image for restoration....

M. Fornasier et al.

Recovery algorithms for vector valued data with joint sparsity constraints

SIAM Journal of Numerical Analysis

(2008)

J. Friedman et al.

Pathwise coordinate optimization

The Annals of Applied Statistics

(2007)

G. Gasso et al.

Recovering sparse signals with a certain family of non-convex penalties and dc programming

IEEE Transactions on Signal Processing

(2009)

M. Girolami et al.

Hierarchic Bayesian models for kernel learning

Y. Grandvalet

Least absolute shrinkage is equivalent to quadratic penalization

Cited by (0)

View full text

ReviewSurveying and comparing simultaneous sparse approximation (or group-lasso) algorithms

Abstract

Introduction

Section snippets

Solving the ℓ1–ℓ2 optimization problem

Generic algorithms for large classes of p and q

Specific case algorithms

Numerical experiments

Conclusions

Acknowledgments

Applied and Computational Harmonic Analysis

Journal of Electroencephalography and Clinical Neurophysiology

Signal Processing

IEEE Signal Processing Magazine

Applied and Computational Harmonic Analysis

NeuroImage

Computational Statistics and Data analysis

Signal Processing

Signal Processing

Signal Processing

NeuroImage

Model-based compressive sensing

IEEE Transactions on Information Theory

Convex Analysis and Optimization

Convex Optimization

Enhancing sparsity by reweighted ℓ1 minimization

Journal of Fourier Analysis and Applications

Sparse representations for multiple measurements vectors (MMV) in an overcomplete dictionary

Atomic decomposition by basis pursuit

SIAM Journal of Scientific Computing

Sparse solutions to linear inverse problems with multiple measurement vectors

IEEE Transactions on Signal Processing

An iterative thresholding algorithm for linear inverse problems with a sparsity constraint

Communication on Pure and Applied Mathematics

Iteratively reweighted least squares minimization for sparse recovery

Communication on Pure and Applied Mathematics

Why simple shrinkage is still relevant for redundant representations?

IEEE Transactions on Information Theory

Compressed sensing of block-sparse signals: uncertainty relations and efficient recovery

IEEE Transactions on Signal Processing

Robust recovery of signals from a structured union of subspaces

IEEE Transactions on Information Theory

Variable selection via nonconcave penalized likelihood and its oracle properties

Journal of the American Statistical Association

Recovery algorithms for vector valued data with joint sparsity constraints

SIAM Journal of Numerical Analysis

Pathwise coordinate optimization

The Annals of Applied Statistics

Recovering sparse signals with a certain family of non-convex penalties and dc programming

IEEE Transactions on Signal Processing

Hierarchic Bayesian models for kernel learning

Least absolute shrinkage is equivalent to quadratic penalization

Review
Surveying and comparing simultaneous sparse approximation (or group-lasso) algorithms

Solving the $ℓ_{1} - ℓ_{2}$ optimization problem

Enhancing sparsity by reweighted $ℓ_{1}$ minimization