Elsevier

Pattern Recognition Letters

Volume 31, Issue 13, 1 October 2010, Pages 1959-1964
Pattern Recognition Letters

Improved Group Sparse Classifier

https://doi.org/10.1016/j.patrec.2010.06.014Get rights and content

Abstract

This work proposes a new classifier based on the assumption that the training samples of a particular class approximately form a linear basis for any new test sample belonging to that class. This is not a new assumption; two previous works namely Sparse Classifier (Yang et al., 2009) and Group Sparse Classifier (Majumdar and Ward, 2009a) has been built upon it. However both the previous works are fraught with certain shortcomings. The two optimization algorithms proposed in the two previous works do not capture all the implications of the said assumption perfectly. This work, accounts for all the intricacies of the said assumption and consequently requires solving a new non-convex optimization problem. We have developed an elegant approach to solve the optimization problem based on Iterative Reweighted Least Squares method.

Our classifier is very flexible; the previous classifiers are actually two special cases of the said classifier. The proposed classifier has been rigorously compared against the previous ones stemming from the said assumption on benchmark classification databases. The results indicate that in most cases the proposed classifier is significantly better than the previous ones and in the worst case the proposed classifier is as good as the previous ones.

Introduction

Recently there has been an interest in a classification model where the training samples of a particular class are assumed to form a linear basis for approximately representing any new test sample belonging to that class. This assumption has led to the development of some new classifiers like the Sparse Classifier (SC) (Yang et al., 2009), Group Sparse Classifier (GSC) (Majumdar and Ward, 2009a) and its fast version called the Fast Group Sparse Classifier (FGSC) (Majumdar and Ward, 2009b). Formally, this assumption can be expressed as,vk,test=αk,1vk,1+αk,2vk,2++αk,nkvk,nk+εk,where vk,test is the test sample of the kth class, vk,i is the ith training sample of the kth class, αk,i is the weight corresponding weight and εk is the approximation error.

Geometrically this assumption implies that each class comprises of a few subspaces, and each subspace is spanned by some training vectors. The new test sample can be approximately represented as a union of these subspaces. To take a concrete example, consider the problem of face recognition. Faces appear in different poses – frontal, left profile, right profile, etc. Each of the poses can be assumed to lie on a subspace. If there are enough training samples, a few samples in each pose will be sufficient to capture the variability of that pose. Now if there is a new test sample, with a pose somewhere between frontal and left profile, then it may be expressed as a linear combination of the training samples in frontal and left profile poses. The relative contribution of the training samples in each pose will be decided by their weights (α).

All the aforementioned classifiers (SC, GSC and FGSC) determine the unknown weights (αk,i’s) by solving different optimization problems and use these weights to determine the class of the test sample. The first step for all of them is to express the assumption in terms of all the training samples instead of just one class as in (1). In compact matrix vector notation it can be expressed as,vk,test=Vα+εwhere V=[v1,1||vn,1||vk,1||vk,nk|vC,1||vC,nC] and α=[α1,1α1,n1αk,1αk,nkαC,1αC,nC].

Given vk,test and V, the problem is to determine the class of the test sample. To achieve this from the said assumption, it is required to solve the unknown weight vector α. The way to do this is to solve an optimization problem of the following form,αˆ=minαφ(α)s.t.vk,test-Vα2σThe objective function φ(α) contains prior information about the nature of α and the constraint arises from the assumption that the misfit is Normally distributed.

The different classification methods vary in their choice of the objective function. In SC (Yang et al., 2009) it is assumed that the weight vector α should be sparse and therefore employed the l1-norm on the weight vector (α1) as the objective function. The choice of the l1-norm as a sparsifying function can be traced back to studies in statistics (Tibshirani, 1996) and signal processing (Chen et al., 2001). However the l1-norm is not an appropriate choice in this case. This is because as pointed out in (Zou and Hastie, 2005) l1-norm only selects a single sample from a group of correlated training samples. For the example of face recognition, it implies that, if the test sample is a pose between frontal and left profile, the SC (employing l1-norm) will try to express the test sample either as a frontal or as a left profile, but not a combination of both. Clearly this is not desired.

To address this issue, the GSC was proposed by Majumdar and Ward (2009a). It employed a mixed norm for the objective function, given by α2,1=kαk2. This norm ensured that for a particular class, all the training samples were selected, but the number of classes selected is sparse (Huang and Zhang, 2008, Eldar et al., 2009). The GSC (also the FGSC which is a fast implementation of GSC) showed good results, but it is not perfect either. This is because the inner l2-norm of the mixed norm selects all the training samples from a particular class. We again refer to the problem of face recognition. In this case, if a test image is somewhere between the left profile and frontal, this classifier will try to represent it as a combination of all the view – left, frontal and right. This is erroneous as well.

The SC and the GSC are two extremes – the SC only selects one sample from the entire class (in the extreme case) and the GSC selects all. Ideally we would like the classifier to represent the test sample only as a combination of the relevant subspaces and not all. For the face recognition problem, this means that if a test sample is somewhere between the left and frontal view, it should be only represented by training samples corresponding to these and not training samples from right profile view. Present classifiers like SC and GSC can not address this problem. This paper is dedicated to find methods to solve this shortcoming so that the classifier can operate between the two extremes.

The other issue we would tackle in this paper is the relaxation of the l2-norm misfit (3). There is no reason (physical or mathematical) to assume that the misfit is normally distributed. In this work we would like to experiment with some other norms for the misfit.

This paper has two major contributions – (i) changing the objective function to address aforementioned issues; and (ii) changing the norm for the misfit. Both of these lead to new optimization problems which will solved in an efficient manner in this paper. The rest of the paper is divided into several sections. In the next section, the background of this work is discussed. The proposed classification model is discussed in Section 3. In Section 4, the optimization problem behind the proposed classifier is detailed out. The experimental results are described in Section 5. The conclusions of the work are discussed in Section 6.

Section snippets

Background

The assumption that a new test sample can be approximated as a linear combination of the training samples of that class was proposed by Yang et al. (2009) for the Sparse Classifier (SC). Following the same assumption the Group Sparse Classifier (GSC) was later developed by Majumdar and Ward (2009a). The Fast version of GSC (FGSC) is not a different classifier, but employs fast algorithms to solve the same optimization problem. Therefore we will discuss the SC and the GSC in detail to understand

Proposed classifier

We have discussed the shortcomings of the SC and the GSC in terms of the objective function in the optimization problem (3) which encodes prior knowledge about the solution. The other issue common to both the SC and the GSC is the l2-norm mismatch. There is no reason which dictates that the error term in (2) is Gaussian; therefore the l2-norm mismatch may not be optimal. It is worth exploring other lq-norm for the mismatch.

We propose to solve (2) via the following optimization problem,αˆ=minαα

Optimization algorithm

The optimization problem (8) is the same as the following problem because the minimizer for ·t is the same as the minimize for ·ttαˆ=minααm,ppsuch thatvtest-Vαqq<σIn standard optimization terminology, the term αm,pp is the objective function and vtest-Vαqq<σ is the constraint. This constrained form has the following equivalent unconstrained formulation for a proper choice of the Lagrangian (λ),αˆ=minααm,pp+λvtest-VαqqThe constant λ is related to σ. But the relationship between λ

Experimental evaluation

The experimental evaluation was carried out in two parts. In the first part, the experiments were carried out on benchmark datasets from the University of California, Irvine Machine Learning (UCI ML) repository. The second part of the experiments were carried out on two widely used face and character recognition datasets, namely the YaleB and USPS, respectively.

For the UCI ML datasets, we only chose those datasets which do not have any missing values or unlabeled samples. The proposed

Conclusion

This paper proposes a new classification algorithm based on a simple assumption that the training samples of a particular class approximately form a linear basis for any new test sample belonging to the class. The assumption appears to be simple, but incorporating the implications of this assumption into the actual optimization problem involved in the classification is far from trivial. Two previous works have already been done based on the said assumption – they are the Sparse Classifier (SC) (

References (21)

  • E. Amaldi et al.

    On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems

    Theor. Comput. Sci.

    (1998)
  • R. Chartrand et al.

    Restricted isometry properties and nonconvex compressive sensing

    Inverse Prob.

    (2008)
  • Chartrand, R., Yin, W., 2008. Iteratively reweighted algorithms for compressive sensing. In: IEEE Internat. Conf. on...
  • S. Chen et al.

    Atomic decomposition by basis pursuit

    SIAM Rev.

    (2001)
  • Daubechies, I., DeVore, R., Fornasier, M., Sinan Güntürk, C., preprint. Iteratively re-weighted least squares...
  • D.L. Donoho

    For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest solution

    Comm. Pure Appl. Math.

    (2006)
  • Eldar, Y.C., Kuppinger, P., Bölcskei, H., 2009. Compressed Sensing of Block-Sparse Signals: Uncertainty Relations and...
  • Haber, E., 1997. Numerical Strategies for the Solution of Inverse Problems. Ph.D. Thesis, Univ. of British...
  • Huang, J., Zhang, T., 2008. The Benefit of Group Sparsity. Available from:...
  • l1-magic....
There are more references available in the full text version of this article.

Cited by (14)

  • Sparse neighbor representation for classification

    2012, Pattern Recognition Letters
    Citation Excerpt :

    SRC cast the recognition problem as one of classifying among multiple linear reconstruction models and argued that new theory from sparse signal representation offers the key to address this problem. The work in (Majumdar and Ward, 2010) is a generalization of SRC, with the original assumption extended to include cases that the test samples can be represented by a nonlinear combination of the training samples. Yang et al. (2011) proposed a robust sparse coding method, by modeling the sparse coding as a sparsity constrained robust regression problem, and it is robust to outliers.

  • Iterative Re-weighted Least Squares Algorithms for Non-negative Sparse and Group-sparse Recovery

    2022, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
  • Class-wise deep dictionary learning

    2017, Proceedings of the International Joint Conference on Neural Networks
View all citing articles on Scopus
View full text