ReviewSurveying and comparing simultaneous sparse approximation (or group-lasso) algorithms
Introduction
Since several years now, there has been a lot of interest about sparse signal approximation. This large interest comes from frequent wishes of practitioners to represent data in the most parsimonious way.
Recently, researchers have focused their efforts on a natural extension of sparse approximation problem which is the problem of finding jointly sparse representations of multiple signal vectors. This problem is also known as simultaneous sparse approximation and it can be stated as follows. Suppose we have several signals describing the same phenomenon, and each signal is contaminated by noise. We want to find the sparsest approximation of each signal by using the same set of elementary functions. Hence, the problem consists in finding the best approximation of each signal while controlling the number of functions involved in all the approximations.
Such a situation arises in many different application domains such as sensor networks signal processing [30], [11], neuroelectromagnetic imaging [22], [37], [59], source localization [31], image restoration [17], distributed compressed sensing [27] and signal and image processing [24], [49].
Formally, the problem of simultaneous sparse approximation is the following. Suppose that we have measured L signals where each signal is of the form where , is a matrix of unit-norm elementary functions, is a weighting vector and is a noise vector. will be denoted in the sequel as the dictionary matrix. Since we have several signals, the overall measurements can be written aswith is a signal matrix, a coefficient matrix and a noise matrix. Note that in the sequel, we have adopted the following notations: and respectively denote the i-th row and j-th column of matrix and is the i-th element in the j-th column of .
For the simultaneous sparse approximation (SSA) problem, the goal is then to recover the matrix given the signal matrix and the dictionary under the hypothesis that all signals share the same sparsity profile. This latter hypothesis can also be translated into the coefficient matrix having a minimal number of non-zero rows. In order to measure the number of non-zero rows of , a possible criterion is the so-called row-support or row-diversity measure of a coefficient matrix defined as The row-support of tells us which atoms of the dictionary have been used for building the signal matrix. Hence, if the cardinality of the row-support is lower than the dictionary cardinality, it means that at least one atom of the dictionary has not been used for synthesizing the signal matrix. Then, the row- pseudo-norm of a coefficient matrix can be defined as: . According to this definition, the simultaneous sparse approximation problem can be stated aswhere is the Frobenius norm and T a user-defined parameter that controls the sparsity of the solution. Note that the problem can also take the different form:For this latter formulation, the problem translates in minimizing the number of non-zero rows in the coefficient matrix while keeping control on the approximation error. Both problems (2) and (3) are appealing for their formulation clarity. However, similarly to the single signal approximation case, solving these optimization problems are notably intractable because is a discrete-valued function. Two ways of addressing these intractable problems (2) and (3) are possible: relaxing the problem by replacing the function with a more tractable row-diversity measure or by using some suboptimal algorithms.
A large class of relaxed versions of proposed in the literature are encompassed into the following form [52]: where typically and . This penalty term can be interpreted as the quasi-norm of the sequence to the power p. According to this relaxed version of the row-diversity measure, most of the algorithms proposed in the literature try to solve the relaxed problem:where is another user-defined parameter that balances the approximation error and the sparsity-inducing penalty . The choice of p and q results in a compromise between the row-support sparsity and the convexity of the optimization problem. Indeed, problem (4) is known to be convex when while it is known to produce a row-sparse matrix if (due to the penalty function singularity at [15]).
The simultaneous sparse approximation problem as described here is equivalent to several other problems studied in other research communities. Problem (4) can be reformulated so as to make clear its relation with some other problems denoted as the group lasso [61] or block-sparse regression [28] in statistics or block-sparse signal recovery [13] in signal processing. Indeed, let us define then, problem (4) can be equivalently rewritten aswhere with being all the indices in related to the i-th element of the dictionary matrix .
As we have already stated, simultaneous sparse approximation and equivalent problems have been investigated by diverse communities and have also been applied to various application problems. Since the literature on these topics has been overwhelming since the last few years. The aim of this work is to gather results from different communities (statistics, signal processing and machine learning) so as to survey, analyze and compare different proposed algorithms for solving optimization problem (4) for different values of p and q. In particular, instead of merely summarizing existing results, we enrich our survey by providing results like proof of convergence, formal relations between different algorithms and experimental comparisons that were not available in the literature. These experimental comparisons essentially focus on the computational complexity of the different methods, on their ability of recovering the signal sparsity pattern and on the quality of approximation they provided evaluated through mean-square error.
Note that recently, there has been various works which addressed the simultaneous sparse approximation in the noiseless case [47], [45] and many works which aim at providing theoretical approximation guarantees in both noiseless and noisy cases [6], [13], [33], [14], [25], [29]. Surveying these works is beyond the scope of this paper and we suggest interested readers to follow for instance these pointers and references therein.
The most frequent case of problem (4) encountered in the literature is the one where and . This case is the simpler one since it leads to a convex problem. We discuss different algorithms for solving this problem in Section 2. Then, we survey in Section 3 generic algorithms that are able to solve our approximation problems with different values of p and q. These algorithms are essentially iterative reweighted or algorithms. Many works in the literature have focused on one algorithm for solving a particular case of problem (4). These specific algorithms are discussed in Section 4. Notably, we present some greedy algorithms and discuss a sparse Bayesian learning algorithm. The experimental comparisons we carried out aim at clarifying how each algorithm behaves in terms of computational complexity, in sparsity recovery and in approximation mean-square error. These results are described in Section 5. For a sake of reproducibility, the code used in this paper has been made freely available.1 Final discussions and conclusions are given in Section 6.
Section snippets
Solving the optimization problem
When p=1, q=2, optimization problem (4) becomes a particular problem named as M-BP for Multiple Basis Pursuit in the sequel. It is a special case that deserves attention. Indeed, it seems to be the most natural extension of the so-called Lasso problem [50] or Basis Pursuit Denoising [7], since for L=1, it can be easily shown that problem (4) reduced to these two problems. The key point of this case is that it yields to a convex optimization problem and thus it can benefit from all properties
Generic algorithms for large classes of p and q
In this section, we present several algorithms that are able to solve the simultaneous sparse approximation problem for a large classes of p and q. In the first part of the section, we review the M-FOCUSS algorithm of Cotter et al. [8] that addresses the case where and q=2. In a second part, we make clear how the penalization term with p=1 and , is related to automatic relevance determination. From this novel insight, we then propose an iterative reweighted least-square
Specific case algorithms
In this section, we survey some algorithms that provide solutions of the SSA problem (4) when p=0. These algorithms are different in nature and provide different qualities of approximation. The first two algorithms we detail are greedy algorithms which have the flavor of Matching Pursuit [32]. The third one is based on a sparse Bayesian learning and is related to iterative reweighted algorithm [60].
Numerical experiments
Some computer simulations have been carried out in order to evaluate the algorithms described in the above sections. Results that have been obtained from these numerical studies are detailed in this section.
Conclusions
This paper aimed at surveying and comparing different simultaneous sparse signal approximation algorithms available in the statistical learning and signal processing communities.
These algorithms usually solve a relaxed version of an intractable problem where the relaxation is based on sparsity-inducing mixed-norm on the coefficient approximation. In the first part of the paper, we have described several algorithms which addresses the most common SSA problem which is a convex problem with an
Acknowledgments
We would like to thank the anonymous reviewers for their comments. This work was supported in part by the IST Program of the European Community, under the PASCAL2 Network of Excellence, IST-216886. This publication only reflects the authors views. This work was partly supported by grants from the ANR ClasSel 08-EMER and ASAP ANR-09-EMER-001.
References (64)
- et al.
Sparsest solutions of underdetermined linear systems via for
Applied and Computational Harmonic Analysis
(2009) - et al.
Neuromagnetic source imaging with FOCUSS: a recursive weighted minimum norm algorithm
Journal of Electroencephalography and Clinical Neurophysiology
(1995) - et al.
Sparse approximations in signal and image processing
Signal Processing
(2006) - et al.
Distributed signal processing in sensor networks
IEEE Signal Processing Magazine
(2006) - et al.
CosAmp: iterative signal recovery from incomplete and inaccurate samples
Applied and Computational Harmonic Analysis
(2009) - et al.
An empirical Bayesian solution to the source reconstruction problem in EEG
NeuroImage
(2005) - et al.
Input selection and shrinkage in multiresponse linear regression
Computational Statistics and Data analysis
(2007) - et al.
On the use of sparse signal decomposition in the analysis of multi-channel surface electromyograms
Signal Processing
(2006) Algorithms for simultaneous sparse approximation. Part II: convex relaxation
Signal Processing
(2006)- et al.
Algorithms for simultaneous sparse approximation. Part I: greedy pursuit
Signal Processing
(2006)