Skip to main content
Top
Published in: Complex & Intelligent Systems 2/2023

Open Access 22-09-2022 | Original Article

A study of sparse representation-based classification for biometric verification based on both handcrafted and deep learning features

Authors: Zengxi Huang, Jie Wang, Xiaoming Wang, Xiaoning Song, Mingjin Chen

Published in: Complex & Intelligent Systems | Issue 2/2023

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Biometric verification is generally considered a one-to-one matching task. In contrast, in this paper, we argue that the one-to-many competitive matching via sparse representation-based classification (SRC) can bring enhanced verification security and accuracy. SRC-based verification introduces non-target subjects to construct dynamic dictionary together with the client claimed and encodes the submitted feature. Owing to the sparsity constraint, a client can only be accepted when it defeats almost all non-target classes and wins a convincing sparsity-based matching score. This will make the verification more secure than those using one-to-one matching. However, intense competition may also lead to extremely inferior genuine scores when data degeneration occurs. Motivated by the latent benefits and concerns, we study SRC-based verification using two sparsity-based matching measures, three biometric modalities (i.e., face, palmprint, and ear) and their multimodal combinations based on both handcrafted and deep learning features. We finally approach a comprehensive study of SRC-based verification, including its methodology, characteristics, merits, challenges and the directions to resolve. Extensive experimental results demonstrate the superiority of SRC-based verification, especially when using multimodal fusion and advanced deep learning features. The concerns about its efficiency in large-scale user applications can be readily solved using a simple dictionary shrinkage strategy based on cluster analysis and random selection of non-target subjects.
Notes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Introduction

Biometric verification is generally considered a one-to-one matching problem, which is solved by comparing the captured biometric data with the gallery template(s) associated with the identity claimed to produce a matching score [1, 2]. The matching score is then used to compare with the system’s operating threshold to decide whether the user can be authenticated or not. The operating threshold is chosen in training phase to minimize some posteriori performance criterion, e.g., equal error rate (EER), based on the genuine and impostor score distributions. However, it is unlikely to collect and/or generate a sufficiently rich set of representative templates for each client to cover all possible changes, for example, expression, pose, illumination, aging, and occlusion on the face, so as to accurately model the score distributions [13]. Data imbalance between the genuine and impostor samples is also a challenge [4]. Moreover, for example, human faces are all somewhat similar, and some subjects may have very similar face images [5]. In real-world applications, it is also unlikely that the distributions of genuine and impostor scores will always be completely non-overlapping. As a result, there is rarely an ideal operating threshold at which both the false accept rate (FAR) and the false reject rate (FRR) are zero. Furthermore, in test phase, one-to-one matching verification ignores the correlation of the probe sample with other people. Therefore, it is insufficient and insecure to authenticate a user using only one-to-one matching and imperfect predetermined operating threshold, especially in the safety–critical applications in military, security, and finance.
Recently, deep learning (DL)-based approaches have made substantial progress in the computer vision and pattern recognition community [6, 7]. Many deep convolutional neural networks (CNN)-based face verification systems have achieved near-perfect performance on large-scale unconstrained benchmarks, such as LFW [8] and MegaFace [9]. However, there are more and more studies reported that the state-of-the-art (SOTA) DL-based face recognition systems, including VGGFace, SphereFace, and ArcFace, are highly vulnerable to presentation attacks [10, 11], morphing attacks [12], and adversarial perturbations [13]. They validated that the deep models with higher FR accuracy show higher levels of vulnerability, and they are more vulnerable than some approaches using handcrafted features [1012]. Note that, these SOTA face verification approaches use deep CNN networks to extract feature from the test image, and then operate in the conventional one-to-one matching verification framework. Apparently, recent efforts in deep learning-based feature extraction have yet to bring about desirable security performance of verification systems. It is time to rethink the simple one-to-one matching and shift some of research focus to new classification mechanisms and security measures.
The sparse representation-based classification (SRC) has been studied extensively and proven powerful in biometric identification [1417]. SRC techniques conduct a one-to-many comparison in a single sparse coding process and is naturally used for identification task. The idea and implementation of the original SRC model are very simple and straightforward. It is to represent the test sample as a sparse linear combination of the training samples in an overcomplete dictionary and then classify the test sample to the class which yields a minimum class-specific reconstruction residue [14]. The dictionary is constructed with the training samples of all the classes. Some variants of SRC also expand the dictionary by adding linear and non-linear variation sub-dictionaries to alleviate the insufficient training samples problem [1820], or directly using a learning dictionary [21]. Recently, several studies reported that when using deep CNN features, many SRC extensions can get significant improvements in accuracy and robustness [16, 2226].
Inspired by the great success of SRC in identification, some studies have introduced SRC in the unimodal verification of speaker, face, and finger vein [2732]. Huang et al. [3] reported that the multimodal verification using SRC shows very promising verification accuracy and strong resistance to the unimodal presentation attacks. SRC-based verification follows the similar pre-classification procedures, but it only compares the class-specific sparsity-based matching score associated with the identity claimed with system’s operating threshold, regardless of the remaining scores. From a process point of view, the difference between SRC-based verification and one-to-one matching verification lies in how matching scores are generated and whether non-target subjects participate in the comparison. A brief overview of the existing literature about SRC and SRC-based verification will be presented in the section “Related work”.
Although the current research has experimentally verified the effectiveness of SRC-based verification in some biometric fields, it has not been studied in-depth, such as the verification characteristics, merits, shortcomings, and challenges. By incorporating non-target subjects, SRC-based verification provides a competing mechanism to allocate class-specific sparsity-based matching scores to all classes in the dictionary using sparse optimization. Owing to the sparsity constraint, SRC allows only one or very few classes to get a good matching score, while the remaining classes get inferior scores. To be accepted, the genuine class needs to defeat almost all the non-target subjects and get an eligible score that is superior to a certain predetermined operating threshold in sparse coding competition. Therefore, an acceptance response made by SRC-based verification should be more convincing than those based on the one-to-one matching. Essentially, the SRC-based verification not only examines the matching score obtained by the client claimed but also compares implicitly the correlations of the query data to the client and many non-target subjects, and thereby offers enhanced protection for identity security.
On the other hand, biometric sample quality often fluctuates with the illumination, pose, and appearance variants [3335]. Moderate data degeneration is inevitable in practical applications. Under these circumstances, once a genuine client fails to get a top rank in the competition, it is more likely to get an extremely inferior score and thus be rejected falsely. It is also impractical to authenticate a user with overly relaxed operating threshold, which will lead to an excessively high false accept rate. Therefore, there is an urgent need to study how to mitigate this problem or under what circumstances SRC-based verification is preferable. Moreover, the heavy computation burden and accuracy degradation in large-scale user applications still haunt the SRC-based identification [36, 37]. With the similar classification mechanism, how to make SRC-based verification free from this problem is also critical.
In this work, we focus on the theoretical analysis and experimental validation about the characteristics, merits, and challenges of the verification with the general SRC model [14], rather than designing new sparse coding model. We also try to explore the directions to resolve its challenges. To these aims, we study SRC-based verification using two sparsity-based matching measures on three biometric traits, i.e., face, palmprint, ear, and their multimodal combinations, based on DCT and ArcFace-based CNN [7] features. The two sparsity-based matching measures are sparse coding error (SCE) and sparse contribution rate (SCR) [14]. The SCE signals the representation ability of SRC, while SCR reflects the sparseness of coding coefficients. In the SRC-based multimodal verification, we apply Sum-rule to combine the matching scores of all modalities. The well-known multimodal methods based on one-to-one matching and cosine similarity are used as competitors, e.g., support vector machine (SVM) and likelihood ratio (LLR) [38].
Our contributions in this paper are summarized as follows.
1.
Overall, we approach a comprehensive study of SRC-based verification that has never been done in the existing literature, including its methodology, characteristics, merits, challenges, and the directions to resolve.
 
2.
Extensive experiments involving three biometric traits and their multimodal combinations, and both handcrafted and deep learning features, demonstrate that SRC-based verification significantly outperforms many well-known methods that are based on one-to-one matching in both unimodal and multimodal scenarios.
 
3.
We empirically confirm a strong correlation between verification accuracy and the inter-class separability among classes in coding dictionary. SRC-based verification is rather suitable for the scenarios using advanced deep learning features and multiple biometrics, avoiding the long tail effect of receiver-operating characteristic (ROC) curve.
 
4.
We propose to shrink the coding dictionary to a certain small scale using cluster analysis and random selection strategy. Dictionary shrinkage can avoid massive computational cost and accuracy degradation in large-scale user applications.
 
The rest of the paper is organized as follows. In the section “Related work”, we briefly review the SRC techniques and the verification using SRC. In the section “Datasets”, we introduce the face, palmprint, and ear datasets, and three chimeric multimodal datasets. In the section “SRC-based biometric verification”, we first present the methodology of SRC-based verification and two sparsity-based matching measures. Second, we discuss its features and challenges. Third, we explore the solutions using dictionary shrinkage and multimodal extension. In the section “Experiments”, we report our experimental results and analysis. Finally, we draw conclusions and provide some research directions for future work in the section “Conclusions and future directions”.

Sparse representation-based classification

Wright et al. [14], for the first time, put forward the SRC model and showed its significant improvement in face identification. Its success has largely boosted the research of biometric recognition based on sparse representation and collaborative representation [5]. A number of variants and extensions of SRC have been proposed in the last decade. Meanwhile, there are also many works that pay attention on the source of their discriminative ability [5, 19, 20, 25, 39].
A major direction is to explore its capacity to handle complex variations like illumination, pose, corruption, and occlusion. Yang et al. [40] proposed a Gabor-based SRC (GSRC) using Gabor features, which was shown to be more robust against illumination changes and pose mismatches. They also proposed a robust sparse coding (RSC) model by regarding the sparse coding as a sparsity-constrained regression problem [15]. RSC can effectively estimate the corrupted pixels and occluded regions and then exclude them from sparse representation in an iterative process. Zhou et al. [41] proposed to detect the contiguous occluded regions in test image using Markov random fields. Illiadis et al. [42] proposed a fast low-rank and iterative reweighted nonnegative least-squares algorithm, namely F-LR-IRNNLS, to address the problem of contiguous occlusions. F-LR-IRNNLS considers the error image is low-rank in comparison of image size, and it follows a distribution that can be described by a tailored potential loss function. Lai et al. [43] proposed a modular weighted global sparse representation (WGSR) method that divides an image into modules and sparsely encodes each module separately, and then dynamically combines their reconstruction errors based on their reliability for final classification. Lai et al. [44] proposed a collaborative patch framework using class-wise sparse representation (CSR-CP) to tackle the problem of uncontrolled training data. CSR-CP optimizes all patches together to seek a groupwise sparse representation by putting all patches of an image into a group.
Although SRC and its extensions have significantly improved the robustness of biometric identification, they are often criticized by the harsh requirements on the quality and the number of training samples per subject, and the poor efficiency in solving sparse optimization problem in large-scale scenarios. SRC requires the training samples per user which are sufficient and well-controlled to maintain its superior performance [14]. However, in real-world applications, the training data often contain a large number of identities but sufficient representative images for every identity cannot be guaranteed. To solve such an insufficient training samples problem, or saying under-sampled problem, a lot of effort has been made in the community. Deng et al. proposed several dictionary augmentation methods to enhance the representation ability of the gallery dictionary, including extended SRC (ESRC) [18], superposed SRC (SSRC) [19], and superposed linear representation classifier (SLRC) [20]. They take advantage of intra-class variation, class centroids, and the sample-to-centroid difference to construct the coding dictionary. Gao et al. [45] proposed a semi-supervised SRC (S3RC) using a variation dictionary to represent the linear variation of test sample and using a learned gallery dictionary based on Gaussian mixture model (GMM) to represent the non-linear variation. Jiang et al. [46] proposed a sparse- and dense-hybrid representation model based on a supervised low-rank dictionary decomposition/learning, aiming to alleviate the under-sampled problem and the uncontrolled training data problem simultaneously. Yang and Wang et al. paid attention on more fine-grained part-based methods [22, 23]. The face image is divided into multiple overlapping patches, centered around 5 facial keypoints and 16 regularly sampled facial points. A joint and collaborative representation is performed on the local dictionaries, each with an intra-class variation sub-dictionary, based on the local convolution or Gabor features for the final classification. The aforementioned methods and strategies have alleviated the problems of under-sampled and uncontrolled training data in small datasets to a certain extent, but their effectiveness in large-scale datasets remains to be tested.
Recall that the SRC methods use the training templates of all classes to construct coding dictionary, and thus, the computational cost of sparse optimization increases with the growth of dictionary scale [14]. Therefore, how to improve the efficiency is crucial in large-scale user applications. To alleviate this issue, Xu et al. [47] proposed a two-phase test sample representation method for face recognition. The first phase uses all of the training samples to represent the test sample using the more efficient L2-norm based collaborative representation and select a limited number of “nearest neighbors” according to the representation ability of each training sample. Xu et al. [48] further improved the method using both the original and new generated ‘symmetrical face’ samples of a small number of classes that are ‘near’ to the test sample to represent and classify it. He et al. [49] proposed to filter the database into a small subset based on the nearest-neighbor criterion in a learned robust metric, and then perform nonnegative sparse representation-based classification with a small dictionary. All the above methods use a two-stage strategy that selects a small subset from the entire database in some efficient way and then performs SRC using the dictionary built with the selected data. Although compared with the one-stage SRC methods, they can substantially reduce the computational cost, the data filtering in the whole dataset are still very time-consuming in large-scale user applications.

Verification using SRC

More than 2 decades ago, Verlinde et al. [50] have proposed a one-to-many matching biometric verification method using a k-NN classifier. This is one of the pioneering attempts to consider non-target subjects in the test phase for verification. Cohort-based score normalization also takes advantage of non-target subjects, but serves the conventional one-to-one matching verification [51]. Nevertheless, the verification using non-target subjects and one-to-many matching did not receive much attention. Recently, inspired by the great success of SRC-based identification, SRC has also been introduced in the fields of the unimodal verification of speaker, face, and finger vein [2732], and the multimodal verification using face and ear [3].
In Ref. [27], GMM mean supervectors are used as features of an utterance to construct coding dictionary. The L1-norm value of the coding coefficients associated with the claimed identity is used as genuine score, while such L1-norm values for the other classes are imposter scores. Although, their experiments did not show improved performance of the proposed SRC-based verification, whose complementary information to the standard UBM-GMM classifier was clearly validated. Li et al. [28] built the coding dictionary using the i-vectors from total variability as atoms and evaluated three sparsity-based measures for speaker verification, including L1-norm ratio (i.e., SCR), L2 residual ratio, and a Bnorm L2 residual (a regularized SCE measure). The Bnorm L2 residual measure outperforms the other two measures in their experiments. The SRC-based verification approaches get inferior performance compared to the SVM classifier based on cosine similarity. However, improved verification results are achieved when combing sparsity-based scores and the SVM results. In Ref. [29], Kua et al. also investigated the i-vector-based SRC verification (iSRC) using L1 constraint, L2 constraint, and both constraints (Elastic net) in the coding optimization. The L1-norm ratio is used as verification criterion, which was claimed to be superior to the other two measures proposed in Ref. [28]. They also validated that a small-size dictionary chosen based on column vector frequency can improve the verification accuracy and efficiency. Hasheminejad and Farsi [30] proposed to learn target, background, noise dictionaries with orthogonal atoms, and concatenated them together to construct overcomplete dictionary for speaker verification. The derived Bnorm L2 residual scores are transformed to log-likelihood-ratio scores before decision. They reported better verification performance than iSRC.
Xin et al. [31] also utilized SCE as matching measure in finger vein verification and got a very low EER of 0.017% on a dataset with 600 fingers, which is also better than many competitive methods in their experiments. Shin et al. [32] performed sparse representation of the test color face image on each color channel of the color configurations, for example, and combined their class-specific reconstruction residuals (i.e., SCE) with Sum fusion rule for verification. The approach surpasses the one-to-one matching verification using LBP and Gabor features by a large margin of 12–22% EER on CMU Multi-PIE and Color FERET face datasets. However, such a system heavily relying on color channels would be sensitive to facial appearance variations, illumination and sensors in applications. And, the EER results of 1.89% and 2.79% they achieved are still very high for real-world applications.
In Ref. [3], Huang et al. performed sparse representation on the face and ear modalities, respectively. The multimodal verification based on the summation of SCE scores of the two modalities achieves about 0.2% EER on a multimodal dataset built with AR face dataset and USTB III ear dataset. However, the method is sensitive to the worst-case partial spoof attacks. Aiming to improve the anti-spoofing performance, they also proposed to use collaborative representation fidelity with non-target subjects to measure the affinity of the query sample to the claimed client. The resulting SCE scores and affinity scores of the two modalities are then combined in a stacked way to train an SVM classifier. The method was reported with promising anti-spoofing performance, and meanwhile, it achieves a good trade-off between verification accuracy and anti-spoofing.
Most of the studies of SRC-based verification are in the speaker verification community. Overall, the verification improvement brought by SRC-based verification is limited, but consumes expensive computation cost. This could be attributed to the large intra-class variations and the small inter-class variations of speech signals, which somewhat violates the two preconditions of SRC application. According to the existing literature, it seems that the SRC-based verification did not get much attention in the mainstream biometric community like face, iris, and fingerprint. In this paper, we are trying to uncover the limitations of SRC-based verification and the concerns about it in the community. We hope that our experimental results and findings will rekindle interest in SRC-based verification. In the end, we would also like to emphasize that the SRC-based verification is different from the approaches in Refs. [5254]. In these studies, sparse coding is used for feature extraction based on a learned dictionary and the verification still operate in the conventional one-to-one matching way. This will lose some of the benefits of competitive matching between the client and non-target subjects.

Datasets

In this paper, we study the SRC-based verification using three modalities, i.e., face, palmprint, ear, and their combinations. Note that SRC classifiers generally require multiple gallery samples per subject to construct an overcomplete dictionary for sparse coding [14, 55], if without using any dictionary augmentation or optimization skills like in Refs. [1823]. We thus select the publicly available Georgia Tech (GT) [56] and AR face datasets [57], PolyU 2D&3D palmprint dataset [58, 59], and USTB III ear dataset [60]. Their constitutions are shown in Table 1. Figure 1 shows sample images of one user in each dataset. In USTB III, the samples in red box are used as gallery, and the remainders are used as probe except the two images in blue box with large pose variation.
Table 1
Summary of the face, palmprint, and ear datasets
Datasets
Face
Palmprint
Ear
GT
AR
PolyU
USTB III
Gallery
7 imgs/user
7 imgs/user
Session 1
7 imgs/user
Session 1
7 imgs/user
Probe
8 imgs/user
7 imgs/user
Session 2
10 imgs/user
Session 2
11 imgs/user
Users
50
100
100
79
To evaluate SRC-based multimodal verification, we create three chimeric multimodal datasets by pairing subjects from datasets of different modalities. This is a widely used method in the community for creating multimodal datasets [1, 4]. The underlying assumption is that generally different modalities can be considered physiologically independent. The adjacency of the face and the ear in physical location may lead to pose correlation between their samples. It depends on the data collection setup and user cooperation. We take this issue into account using a universal pairing protocol to produce more virtual multimodal samples. That is, one sample of a modality is paired with all the probe samples of another modality to form multiple multimodal probe samples. For example, for a virtual subject in MD III, the 7 face images from GT Probe are paired with the 10 palmprint images from PolyU Probe. Then, we can get 100 × 7 × 10 = 7000 multimodal probe samples for all the 100 subjects, as shown in Table 2. This universal pairing protocol can bring more instances for testing, covering the possible multimodal combinations as much as possible, and meanwhile, the evaluation is more challenging. Note that we use a unique pairing protocol to create multimodal gallery samples.
Table 2
The constitution of the chimeric multimodal datasets
Datasets
Gallery
Probe
Components
Samples
Components
Samples
MD I
GT Gallery
50 × 7 = 350
GT Probe
50 × 8 × 11 = 4400
USTB III Gallery
USTB III Probe
MD II
AR Gallery
79 × 7 = 553
AR Probe
79 × 7 × 11 = 6083
USTB III Gallery
USTB III Probe
MD III
AR Gallery
100 × 7 = 700
AR Probe
100 × 7 × 10 = 7000
PolyU Gallery
PolyU Probe
Table 2 summarizes the composition of the three chimeric multimodal datasets we created, namely, MD I, MD II, and MD III. MD I and MD II use face and ear traits, and MD III uses face and palmprint traits. Note that we use the minimum number of users to construct a multimodal dataset when the unimodal datasets used have different user scales. MD I uses the first 50 subjects of USTB III, and MD II uses the first 79 subjects of AR face dataset. To differentiate these unimodal subsets with their whole datasets, they are hereinafter represented by “USTB III (50)” and “AR (79)”.
In our experiments using DCT features, all images of the three modalities are resized to 50 × 40 pixels for extracting 200-D DCT feature. In the CNN experiments, all images are resized to 112 × 112 pixels before being fed into ArcFace-based CNN networks. If not specified, the experimental results for illustrations in the section “SRC-based biometric verification” are obtained when using DCT features.

SRC-based biometric verification

SRC model

Assume that there are sufficiently enough well-controlled training samples for each class in the dataset with K classes. For simplicity, suppose all the classes have \(n\) training samples. Let \({\mathbf{A}}_{i} = \left[ {{\mathbf{a}}_{i,1} ,{\mathbf{a}}_{i,2} , \cdots ,{\mathbf{a}}_{i,n} } \right] \in R^{M \times n}\) be a matrix formed by the training samples of ith class, \({\mathbf{A}} = \left[ {{\mathbf{A}}_{1}^{{}} ,{\mathbf{A}}_{2}^{{}} , \cdots ,{\mathbf{A}}_{K}^{{}} } \right] \in R^{M \times N}\)(\(N = n \times K\)) is the dictionary composed of all training samples. Given a query sample \({\mathbf{y}}\), it can be represented by \({\mathbf{y}} = {\mathbf{A}}{\varvec{\alpha }}\), where \({{\varvec{\upalpha}}} \in R^{N}\) is the coding coefficient vector. If \(M < < N\), generally a sparsest solution can be sought by solving a \(L_{0}\)-norm optimization problem
$$ {\hat{\varvec{\alpha }}} = \arg \min \left\| {{\varvec{\upalpha}}} \right\|_{0} \quad {\text{s.t.}} \quad \left\| {{\mathbf{y}} - {\varvec{A\alpha }}} \right\|_{2} < \varepsilon , $$
(1)
where \(\left\| \cdot \right\|_{0}\) denotes the \(L_{0}\)-norm, and \(\varepsilon > 0\) is a constant.
Solving \(L_{0}\) optimization problem in (1) is NP hard and extremely time-consuming. However, if the solution of \(L_{0}\) optimization problem in (1) spare enough, it is equal to the solution to the following \(L_{1}\)-norm optimization problem [14]:
$$ {\hat{\varvec{\alpha }}} = \arg \min \left\| {{\varvec{\upalpha}}} \right\|_{1} \quad {\text{s}}.{\text{t}}.\quad \left\| {{\mathbf{y}} - {\varvec{A\alpha }}} \right\|_{2} < \varepsilon , $$
(2)
where \(\left\| \cdot \right\|_{1}\) denotes the \(L_{1}\)-norm. This problem can be solved in polynomial time by standard linear programming algorithms [61].
In our experiments, we use \(l_{1} - 1{\text{s}}\) optimization method [30] to solve sparse coding problem. Once \({\hat{\varvec{\alpha }}}\) is achieved, the class-specific sparse coding error (SCE) of ith class can be calculated using the coefficients associated with ith class as follows:
$$ r_{i} \left( {\mathbf{y}} \right) = \left\| {{\mathbf{y}} - {\mathbf{A}}_{i} \delta_{i} \left( {{\hat{\varvec{\alpha }}}} \right)} \right\|_{2} , $$
(3)
where \(\delta_{i}\): \(R^{N} \to R^{N}\) is the characteristic function that selects the coefficients associated with the ith class.
The SRC and most of its extensions identify a query sample by sorting all the resulting SCE scores and then assign the class with the least SCE score.

SRC-based verification

Rather than examining all class-specific matching scores of all classes in dictionary, SRC-based verification can only calculate the matching score of the class associated with the identity claimed and then compare it with a predetermined operating threshold for an acceptance or rejection output. Figure 2 shows the flowchart of SRC-based unimodal verification. A client claims an identity and the corresponding sub-dictionary is used to build an overcomplete dictionary together with the sub-dictionary of non-target subjects.
Suppose \({\mathbf{A}}_{c} = \left[ {{\mathbf{a}}_{c,1} ,{\mathbf{a}}_{c,2} , \cdots ,{\mathbf{a}}_{c,n} } \right]\) is the sub-dictionary of the client claimed, \({\mathbf{A}}_{b}^{{}} = \left[ {{\mathbf{A}}_{1}^{{}} ,{\mathbf{A}}_{2}^{{}} , \cdots ,{\mathbf{A}}_{k - 1}^{{}} } \right]\) is the sub-dictionary composed of the gallery samples of the non-target subjects involved, or saying background dictionary, and then, the overall dictionary can be rewritten as \({\mathbf{A}} = \left[ {{\mathbf{A}}_{c}^{{}} ,{\mathbf{A}}_{b}^{{}} } \right]\). In our experiments, if no specific instructions, the coding dictionary \({\mathbf{A}}\) is composed of gallery samples of all subjects in each dataset. The SCE score associated with the claimed identity can be calculated as follows:
$$ r_{c} \left( {\mathbf{y}} \right) = \left\| {{\mathbf{y}} - {\mathbf{A}}_{c} \delta_{c} \left( {{\hat{\varvec{\alpha }}}} \right)} \right\|_{2} . $$
(4)
The superior identification performance of SRC and its extensions have validated that the SCE, as a distance measure, is a good candidate to measure the correlation between a query sample and a specific class. Thus, it is reasonable to use SCE for verification. Considering the binary classification in verification decision, the output is either acceptance or rejection, which can be denoted with 1 and 0, respectively. Given an operating threshold \(\theta_{sce}\), the verification rule with SCE can be written as
$$ d\left( {r_{c} \left( {\mathbf{y}} \right)} \right) = \left\{ {\begin{array}{*{20}c} {\begin{array}{*{20}l} 1 \hfill & {{\text{if }}r_{c} \left( {\mathbf{y}} \right) \le \theta_{sce} } \hfill \\ \end{array} } \\ {\begin{array}{*{20}l} 0 \hfill & {{\text{if }}r_{c} \left( {\mathbf{y}} \right) > \theta_{sce} } \hfill \\ \end{array} } \\ \end{array} } \right.. $$
(5)
The sparsity concentration index (SCI) is a measure proposed along with SRC and SCE measure [14]. SCI measures how good the coding coefficient vector itself is in terms of localization. SCI is close to 1 when the query sample is encoded using only the dictionary atoms from a single class, while it is close to 0 if the coefficients spread evenly over all the classes. We refer the readers to Ref. [14] for its detailed formulation. SCI is often used to validate whether a query sample is a valid sample from the subjects in the coding dictionary, as a complementary measure with SCE.
Essentially, SCI depends on the class that contributes the most in sparse coding, whose value is the largest Sparsity Contribution Rate (SCR) among those of all classes in the dictionary. SCR reflects the participation degree of a specific class in representing the query sample. A larger SCR value indicates a higher possibility of the query sample belonging to a specific class. Therefore, SCR can also be used as a similarity measure for verification. The SCR score associated with the class claimed can be calculated as follows:
$$ \rho_{c} \left( {{\hat{\varvec{\alpha }}}} \right) = {{\left\| {\delta_{c} \left( {{\hat{\varvec{\alpha }}}} \right)} \right\|_{1} } \mathord{\left/ {\vphantom {{\left\| {\delta_{c} \left( {{\hat{\varvec{\alpha }}}} \right)} \right\|_{1} } {\left\| {{\hat{\varvec{\alpha }}}} \right\|_{1} }}} \right. \kern-\nulldelimiterspace} {\left\| {{\hat{\varvec{\alpha }}}} \right\|_{1} }}. $$
(6)
Apparently, \(\rho_{c} \left( {{\hat{\varvec{\alpha }}}} \right) \in \left[ {0,1} \right]\). The verification rule with a given threshold, \(\theta_{scr}\), can be written as
$$ d\left( {\rho_{c} \left( {{\hat{\varvec{\alpha }}}} \right)} \right) = \left\{ {\begin{array}{*{20}c} {\begin{array}{*{20}l} 0 \hfill & {{\text{if }}\rho_{c} \left( {{\hat{\varvec{\alpha }}}} \right) \le \theta_{scr} } \hfill \\ \end{array} } \\ {\begin{array}{*{20}l} 1 \hfill & {{\text{if }}\rho_{c} \left( {{\hat{\varvec{\alpha }}}} \right) > \theta_{scr} } \hfill \\ \end{array} } \\ \end{array} } \right.. $$
(7)
Figure 3 demonstrates the distributions of SCE and SCR scores obtained on AR face dataset. As for SCE, most genuine scores distribute in [0, 0.5], while the impostor scores concentrate around 1.0. On the contrary, almost all impostor scores of SCR are close to 0, while its genuine scores spread in a wide range. Compared with SCE, the overlap between genuine and impostor distributions is rather evident. This implies that the verification based on SCE should be better, which will be demonstrated in the section “Experiments”. The disadvantage of SCR could possibly originate from that (2) is solved by choosing \({{\varvec{\upalpha}}}\) to minimize the overall SCE but not the SCR [14].
Note that some variants of SCE and SCR are used for speaker verification in Refs. [28, 29]. However, Kua et al. [29] found that SCR is best in their speaker verification. SCE are also selected in Refs. [28, 31] for face and finger vein verification. Moreover, SCE and SCR have already been investigated in a variety of biometric identification applications [40, 55, 62]. SCE signals the representation ability of SRC, while SCR reflects the sparseness of coding vector. They are more general and representative, and thus more suitable for exploring SRC-based verification features in this paper.

Characteristics and merits

In contrast to the conventional one-to-one matching verification, SRC-based verification conducts one-to-many matching between the query sample and the templates of the client and non-target subjects. SRC-based verification provides a competing mechanism, i.e., sparse coding optimization, which allocates class-specific sparsity-based matching scores to all classes in dictionary according to their correlations with the submitted data. Moreover, with the sparsity constraint, SRC generally allows only one or very few classes to get a convincing sparsity-based matching score, while the remaining classes get very inferior scores, as shown in the upper plot of Fig. 4. In other words, to be accepted, the genuine class needs to defeat almost all the non-target classes in the competing coding process. Therefore, an acceptance response made by SRC-based verification should be more convincing than those based on one-to-one matching. Overall, SRC-based verification not only examines the matching score obtained by the client, but also implicitly compares the correlations of the query data to the client and many non-target subjects, and thereby offers enhanced protection for identity security.
Figure 5 shows the SCE and SCR score distributions obtained on GT face dataset. We divide the genuine scores into two categories. The first category is the top 5 scores among all matching scores obtained in a verification process. The second category is the remainders. The distributions of two categories are outlined with red and blue lines in Fig. 5. It is quite clear that in both SCE and SCR cases, the top 5 genuine score distribution has a very trivial overlap with impostor score distribution. This means that a top rank usually comes along with a favorable genuine score, and vice versa. On the contrary, the genuine scores out of top 5 are so inferior that they are all close to the impostor distribution center. These results show that if the genuine class can defeat most of the non-target classes, SRC-based verification system will generally accept the verification request. Hence, the rank information among the client and non-target subjects is implicitly employed by SRC-based verification, embedding in the sparsity-based matching score. This implies that the sparsity-based matching scores obtained from competitive matching are more discriminative than the scores that come from one-to-one comparison, e.g., Euclidean distance and cosine similarity.
Generally, compared with non-target subjects, a genuine query sample (or feature) with good biometric quality is more similar with the client, who can hence win an eligible score in the competition, superior to a predetermined operating threshold. If the submitted biometric sample is unreliable, for example, it is captured from an intruder or has poor biometric quality, it is likely that none of the classes can dominate the sparse coding competition. Consequently, the coding coefficients will be spread evenly over all classes [14]. The client claimed can only get an inferior score and is rejected. This is often the case for the fake biometric traits used in spoof attacks if without sophisticated biometric fabrication. Besides the security improvement, SRC-based verification can also achieve significant advantage in verification accuracy in both unimodal and multimodal scenarios, compared with the conventional one-to-one matching verification. The experimental results will be presented in the section “Experiments”.

Challenges

However, biometric sample quality often fluctuates with the illumination, pose, and appearance variants [3335]. Moderate data degeneration is inevitable in real-world applications. The submitted sample could probably be much different from the gallery samples of the claimed client, or even be somewhat similar to those of non-target subjects. Under these circumstances, the genuine client may fail to achieve a top rank in the competing sparse coding. As a result, the client will get an extremely inferior sparsity-based genuine score, as shown in the lower plot of Fig. 4. We can see in Fig. 5 that although distribution overlap between the genuine and impostor is not evident, there are a certain number of genuine scores spread near the impostor score distribution center.
Moreover, these genuine scores are so inferior that in real-world applications, it is impossible to accept the verification requests by tuning operating threshold, for the sake of avoiding high false accept rate. Figure 6 shows the ROC curves of SRC-based unimodal verification using SCE and SCR on all unimodal datasets. In the SCE case, they either get very high false reject rates in a wide range of FAR variation or their ROC curves almost do not drop at all after 10% FAR. In the SCR case, although the ROC curve on AR face dataset finally converges to 0% FRR, it is unacceptable for a 44% FAR in real applications. We call the phenomenon that the ROC curve cannot converge to 0% FRR or has a very long and flat tail along the FAR axis as FRR bottleneck problem. An evident FRR bottleneck problem will degrade user experience, and even makes the SRC-based verification unacceptable.
Due to the involvement of non-target subjects, SRC-based verification meets another challenge, that is, computational cost. Suppose that the atom dimensionality M in \({\mathbf{A}} \in R^{M \times N}\) is fixed, the complexity of sparse coding optimization depends on the number of atoms. For example, the empirical complexity of commonly used l1-ls optimization method to solve (2) is O(Nv) with v = 1.5 [15, 61]. In the applications with large-scale users, if using the gallery samples of all enrolled clients to build a coding dictionary, the computational cost would be prohibitively expensive.
Furthermore, it is well acknowledged that the increase of dictionary scale usually leads to accuracy degradation for SRC-based identification [36, 37, 39]. Likewise, SRC-based verification may confront the same challenge in the applications with large user scale. The more non-target subjects involved in sparse coding, the higher possibility to increase the distribution overlap in feature subspace between the sub-dictionaries of the genuine class and the non-target classes, i.e., \({\mathbf{A}}_{c}^{{}}\) and \({\mathbf{A}}_{b}^{{}}\).
As illustrated in Fig. 7a, the convex hull spanned by biometric samples is only an extremely tiny portion of the unit sphere \({\text{S}}^{M - 1}\), and the training vectors are tightly bundled together as a “bouquet” [63]. In the overall convex hull, the distribution interval among classes is very small, and many classes may overlap their neighbors to some degree. In reality, the appearance variations, pose, and alignment error will also aggravate the distribution overlaps. Accordingly, the more classes in dictionary may introduce more distribution overlaps. The inter-class separability between the genuine class and the overall non-target classes plays a critical role in recognition [20]. In the section “Experiments”, we empirically confirm the evident correlation between the inter-class separability and SRC-based verification accuracy.
Overall, SRC-based verification has to contend with the following challenges: (1) Once the genuine class fails to get a top rank in encoding the query data, it is very likely to result in extremely inferior genuine score. Consequently, the SRC-based verification may suffer from evident FRR bottleneck problem. If this problem cannot be resolved or avoided properly, SRC-based verification may not be suitable for some biometrics and the application scenarios that require both very low FRR and FAR. (2) Owing to the involvement of non-target subjects, the larger scale of the coding dictionary used, the more likely it is to degrade the verification efficiency and accuracy.
One may also notice that SRC techniques generally require multiple training samples per class for dictionary construction [14, 28]. This requirement is rather harsh and even impractical for some identification tasks. However, SRC-based verification is designed for the positive recognition scenarios where user cooperation is generally available [1]. A representative set of biometric samples per user can be captured in registration phase. If the gallery samples collected are not sufficient, there are still many ways to generate simulated biometric samples for users based on their enrolled data [6466]. The 3D face models [64] or generative adversarial networks (GAN) models [65, 66] learned from gallery samples can be used to generate samples with variants like pose and illumination. The dictionary augmentation skills like supplementing an intra-class variation dictionary can also be helpful to alleviate this problem [18, 20]. Besides, for the non-target subjects, SRC-based verification does not have to require their gallery samples with equivalent number and representative capability.

Small random dictionary

In the application scenarios with large user scale, if using gallery samples of all the enrolled subjects to build coding dictionary, it seems inevitable to degrade verification accuracy, and meanwhile significantly increase computational cost. In this subsection, we will first clarify the unnecessary use of a large number of non-target subjects to construct dictionary. Then, we propose a straightforward but effective strategy to shrink the dictionary via cluster analysis and random selection.
The non-target subjects used in coding dictionary play a critical role in SRC-based verification. Security improvement can be achieved through the one-to-many competitive matching among the client and non-target subjects. The more non-target subjects engage in the competition, the higher reliability of an “acceptance” decision can be achieved. However, the more non-target subjects involved also increase the intensity of competition, thereby increasing the likelihood of false “rejection” to genuine clients, while inevitably leading to higher computation burden. From these viewpoints, an excess number of non-target subjects are not only unnecessary, but can also cause negative effects.
To explicitly illustrate the risk of using excess non-target subjects for dictionary construction, we plot two toy examples in Fig. 7. Suppose there are 6 subjects, 4 training samples per class, and the feature dimension is 2, the convex hull is depicted in Fig. 7b. Considering a \(K = 6\) sparse \(L_{0}\)-norm sparse coding problem, the percentage of the K nonzero entries obtained by a class can reflect the probability of the query sample belonging to this class, similar with the SCR measure in \(L_{1}\)-norm optimization. Given a query sample of Class 1 on its distribution boundary, if randomly selecting half of the classes and using all their training samples as dictionary atoms, 3 training samples of Class 1 could possibly be used to represent the query sample, as illustrated in Fig. 7c. In this case, the genuine score is \({3 \mathord{\left/ {\vphantom {3 K}} \right. \kern-\nulldelimiterspace} K} = 0.5\). On the other hand, if using the training samples of all the classes for coding, the genuine score may become smaller like \({2 \mathord{\left/ {\vphantom {2 K}} \right. \kern-\nulldelimiterspace} K} = 0.33\), as shown in Fig. 7b. This score is much lower than the 0.5 achieved in Fig. 7c. It might not be a problem in close-set identification, since Class 1 still gets the top one rank. However, in verification, such a low score inclines to cause a false rejection by the comparison with a predefined operating threshold.
Considering the above observations and analyses, it is inadvisable to use a large number of biometric samples of the enrolled subjects to build a coding dictionary for verification. Here, we consider a simple dictionary shrinkage strategy via cluster analysis and random selection. The basic idea is that we first conduct cluster analysis on the training samples of all enrolled subjects, and then randomly select a few subjects from each cluster and use their training samples to construct an overcomplete dictionary with a limited number of classes. Cluster analysis is applied to avoid the worst-case scenario where all the selected subjects are concentrated in a tiny space and their distributions overlap heavily. To differentiate from the full dictionary with all the enrolled subjects, we call such a dictionary as small random dictionary.
In our experiments, all the query samples are used for testing, including those of the subjects selected for dictionary construction. The coding dictionaries for the selected and unselected classes are not the same but equivalent, that is, they share the same \({\mathbf{A}}_{b}^{{}}\). For example, 50 subjects are selected in PolyU dataset. In testing phase, the query samples of these classes are encoded based on the dictionary built with them. When the query samples are of the remaining classes, one of the 50 subject is replaced by the class claimed to construct dynamic coding dictionary. Hence, the resulting sparsity-based matching scores should be compatible. By conducting experiments on different small random dictionaries with the same scale on unimodal and multimodal datasets, we find that they consume much less time and can bring about better or comparable verification accuracy, compared with the full dictionary. This evidence supports that shrinking the dictionary by selecting non-target classes and training samples is feasible to avoid the heavy computational burden and recognition accuracy degradation on large-scale dataset.
The benefits of small random dictionary are summarized as follows. First and foremost, the smaller scale of dictionary means the lower sparse coding complexity and thus makes the SRC-based verification more efficient. Second, compared with large-scale dictionary, smaller scale dictionary is less likely to bring in more distribution overlaps among classes. Besides, the strategy of using small random dictionary is simple and requires no complicated training and preprocessing.

Multimodal verification

According to the analyses above, the SRC-based verification is rather suitable for the application scenarios where well inter-class separability is available. It is well acknowledged that in the multimodal feature space classes are better separated than in the unimodal feature space [1, 3, 4, 33, 34, 67]. Besides, it is much more difficult to spoof multiple biometric modalities than to spoof only one modality [4]. In this paper, we study the SRC-based multimodal verification with the combinations of face and ear, face, and palmprint, using Sum fusion rule.
The ear is located near the face, and can be captured along with the face using the same type of sensor or a single sensor at two times. Face detection can also help speed up ear detection by offering an ear region of interest. Most popular face feature extraction and classification techniques are applicable to the ear and palmprint. The recognition systems using ear and palmprint are also contactless. Besides, the ear has several appealing merits over the face: the ear has a stable structure with rich information, nearly unaffected by aging and expressions [68, 69]. Although there is a common impression that human ears are usually occluded by hair, it can be avoided via user cooperation in verification scenario. The studies in Refs. [33, 34] have already validated that the multimodal identification with face and ear can significantly improve the recognition accuracy and robustness. Compared with the face and ear, it is much harder to steal a person’s palmprint.
Suppose \({\mathbf{A}}^{f} = \left[ {{\mathbf{A}}_{c}^{f} ,{\mathbf{A}}_{b}^{f} } \right]\) and \({\mathbf{A}}^{e} = \left[ {{\mathbf{A}}_{c}^{e} ,{\mathbf{A}}_{b}^{e} } \right]\) are, respectively, the face and ear coding dictionaries. The SCE and SCR matching scores of the face and the ear can be calculated using (4) and (6), respectively. As shown in Fig. 8, the proposed SRC-based multimodal verification system first performs two independent sparse coding procedures, and then integrates the derived sparsity-based matching scores. Since the SCE or SCR of the face and the ear have similar distributions, we hence directly combine them without score normalization, empirically which brings no improvement in our experiments.
For convenience, let \({\text{s}}_{{}}^{f}\) and \(s_{{}}^{e}\) be the sparsity-based matching scores of two modalities. The multimodal matching score with Sum fusion is calculated as \(s = s_{{}}^{f} + s_{{}}^{e}\). The multimodal verification system makes decision with the similar rule in (5) or (7), according to the measure used.
Figure 9 plots the distributions of the multimodal SCE and SCR scores obtained on the MD I dataset. The overlap between the genuine and impostor distributions is trivial in both cases. Especially, both categories of the multimodal SCE show evident distribution centers that are far apart from one another. This implies a good robustness of the multimodal verification system. The detailed experimental results will be given in the section “Experiments”.

Experiments

Settings

For convenience, we denote the SRC-based verification methods with SCE and SCR as SRC_sce and SRC_scr, respectively. The l1_ls optimization [61] is used to solve the sparse coding problem. The unimodal verification method and multimodal verification method with Sum fusion [70] based on one-to-one matching and cosine similarity are used as unimodal and multimodal baselines. The multimodal SRC_sce and SRC_scr are also compared with the well-known multimodal methods, i.e., LLR [38] and SVM [71]. The multimodal SVM method fuses the matching scores of all the modalities in a stacked way and uses the RBF kernel with sigma of 0.25. They also use cosine similarity scores and are evaluated with tenfold cross-validation method.
In the experiments using ArcFace-based CNN features, the publicly available pretrained ResNet 50 model1 is used to finetune for each modality. This model was trained on the MS1M-Arcface2 dataset with the Arcface loss. Note that we revise the networks output to be a 200-D feature embedding, hence, we use the third-party dataset and some gallery samples of our datasets to separately finetune the networks. We use 2 gallery samples/subject of AR, 3 gallery samples/subject of GT, and a small subset of CASIA-WebFace to finetune the face model. The ear network is finetuned with 4 gallery samples/subject of USTB III, and 3352 ear samples of 300 subjects collected from college students. The palmprint network is finetuned with 3000 gallery samples of the remaining 300 subjects of the PolyU 2D&3D palmprint dataset. In the finetuning, the batch size is 32, initial learning rate is 0.001, weight decay is 0.0005, and momentum is 0.9. We train each model on one NVIDIA RTX 2080ti GPU card.
For each type of the sparsity-based matching measure, according to (4) and (6), given \(K\) classes in the coding dictionary, we can obtain one genuine score and \(K - 1\) impostor scores for a probe sample. For the SRC-based multimodal verification, we get 4400 genuine scores and 4400 × (50–1) = 215,600 impostor scores on MD I, 6083 genuine scores and 6083 × (79–1) = 474,474 impostor scores on MD II, 7000 genuine scores and 7000 × (100–1) = 693,000 impostor scores on MD III. As for the competing methods, we empirically select the best matching score from the comparisons of a probe sample and all training samples of a class, hence, the same numbers of genuine and impostor scores are available. All the experiments except CNN feature extraction are conducted on Matlab platform on a desktop with 3.3 GHz CPU, 64 GB RAM.

Results and discussion

Unimodal verification

Tables 3 and 4 report all the unimodal and multimodal verification results in terms of EER. In the unimodal experiments using DCT features, compared with the baseline, both SRC_sce and SRC_scr get about 14% EER decrease on all the three face datasets, while they reduce the EER by roughly 10% and 5% in ear and palmprint cases. The unimodal SRC_sce performs much better than SRC_scr on all the datasets except the PolyU. Impressively, both the unimodal SRC_sce and SRC_scr evidently outperform all the multimodal methods using cosine similarity.
Table 3
Biometric verification performance (EER, %) when using DCT features
Datasets
Unimodal verification
Multimodal verification
 
Baseline
SRC_scr
SRC_sce
Sum
SVM
LLR
SRC_scr
SRC_sce
MD I
        
 GT
18.766
5.868
4.857
9.251
6.034
6.568
0.773
0.545
 USTB III (50)
12.981
2.509
1.636
MD II
        
 AR (79)
17.131
2.497
1.972
9.69
6.44
6.785
0.395
0.195
 USTB III
13.031
2.201
1.473
MD III
        
 AR
17.344
2.857
2.429
14.183
3.12
3.371
0.155
0.125
 PolyU
6.642
1.02
1.203
Table 4
Biometric verification performance (EER, %) when using CNN features
Datasets
Unimodal verification
Multimodal verification
 
Baseline
SRC_scr
SRC_sce
Sum
SVM
LLR
SRC_scr
SRC_sce
MD I
        
 GT
1.579
0.75
0.304
0.162
0.139
0.149
0.023
0.018
 USTB III (50)
1.610
0.84
0.364
MD II
        
 AR (79)
1.602
0.182
0.181
0.143
0.136
0.137
1.48e-3
6.32e-4
 USTB III
1.595
0.653
0.361
MD III
        
 AR
1.571
0.326
0.286
0.073
0.067
0.063
4.33e-4
1.44e-3
 PolyU
1.071
0.329
0.201
When using CNN features for verification, as shown in Table 4, all the methods get significant improvements. On AR dataset, the unimodal baseline gets an EER less than 1/11 of that it gets when using DCT features. The advantage of SRC-based verification over baseline method is also significant. The EER rates of baseline method are about 4.4 to 8.9 times of those SRC_sce gets on the same datasets. And, the unimodal SRC_sce consistently outperforms the unimodal SRC_scr.
Recall that all the unimodal SRC-based verification methods confront evident FRR bottleneck problem when using DCT features, as shown in Fig. 6. On the contrary, as the ROC curves shown in Fig. 10, the unimodal SRC_sce using CNN features does not meet this problem on any datasets. Although the FRR bottleneck problem still haunts the unimodal SRC_scr on GT dataset, it becomes rather trivial compared with that appears in DCT experiments. This result implies that more discriminative feature used for SRC-based verification can alleviate and even avoid the FRR bottleneck problem.

Multimodal verification

In the right columns of Tables 3 and 4, the multimodal SRC_sce and SRC_scr are compared with their multimodal competitors. In the experiments with DCT features, the multimodal SRC_sce gets the best EER results of 0.545%, 0.195%, and 0.125% on MD I, MD II, and MD II, respectively, while the best results obtained by the conventional multimodal methods are only 6.034%, 6.44%, and 3.12%. The multimodal SRC_scr also performs significantly better than the conventional methods on all datasets, though it cannot be compared with the multimodal SRC_sce. The comparison of ROC curves in Fig. 11 visually demonstrates their significant superiority to the LLR and SVM. Note that we outline the ROC curves of SRC-based methods and the methods using cosine similarity in different plots for showing more details. We would also like to mention that the inferior performance of LLR and SVM reflects the challenges due to expression, illumination, and pose variations in face, ear, and palmprint samples.
When using CNN features, we can see from Table 4 that the SRC-based multimodal verification achieves extraordinary improvements. Both SRC_sce and SRC_scr obtain promising EER results that range from 4.33e-4% to 1.48e-3% on MD II and MD III. Their ROC curves almost overlap completely, as shown in Fig. 12. Note that the best EER results of LLR and SVM are as high as 0.136% and 0.063% on these two datasets. We do not see the FRR bottleneck phenomenon on the ROC curves of SRC_sce and SRC_scr, as shown in Fig. 12.
Overall, compared with the unimodal methods, all multimodal methods get significant improvements in the experiments with both DCT and ArcFace-based CNN features. Therefore, it is validated that the proposed SRC-based multimodal methods significantly outperform their unimodal counterparts and the well-known conventional methods.

Correlation with inter-class separability

Wright et al. attributed the success of SRC to that it can better exploit the actual (possibly multimodal and non-linear) distributions of the training samples of each class and is therefore likely to be more discriminative among multiple classes [14]. Note that the biometric quality of query samples and inter-class separability are the two major factors that affect biometric recognition performance. As shown in Fig. 1, the biometric quality of face, ear, and palmprint probe samples used is roughly comparable. The performance of SRC-based identification on each dataset can reflect the inter-class separability of samples in coding dictionary to some extent [20]. Recall that SRC-based verification and identification share the same comparison mechanism. Therefore, we use the commonly used rank-1 recognition rate and the overall Cumulative Match Characteristic (CMC) curve of SRC-based identification as inter-class separability indicators. Note that the SRC-based multimodal identification method evaluated here uses SCE measure and Sum fusion as the proposed SRC_sce multimodal verification.
Tables 5 and 6 report the EER results and the corresponding rank-1 recognition rates when using DCT or CNN features. We sort the datasets according to the identification accuracy in an ascending order. Thus, the datasets on the right side should have better inter-class separability in terms of rank-1 accuracy. In DCT experiments, SRC_sce always obtains lower EER results on the right datasets except on USTB III (50), as shown in Table 5. However, when look on the ROC curves in Fig. 6, we can see that the overall verification performance of SRC_sce on USTB III (50) is better than that on USTB III. Therefore, when using DCT features, SRC_sce achieves better verification performance on the datasets where SRC gets better identification results.
Table 5
SRC-based identification accuracy (Rank-1 accuracy, %) and verification error rate (EER, %) when using DCT features
 
Unimodal recognition
Multimodal recognition
GT
AR
AR (79)
USTB III
USTB III (50)
PolyU
MD I
MD II
MD III
Identification
89.75
90
92.224
94.71
95.091
96.9
99.068
99.408
99.757
Verification
4.857
2.429
1.972
1.473
1.636
1.203
0.545
0.195
0.125
Table 6
SRC-based identification accuracy (Rank-1 accuracy, %) and verification error rate (EER, %) when using CNN features
 
Unimodal recognition
Multimodal recognition
USTB III
USTB III (50)
AR
GT
PolyU
AR79
MD I
MD II
MD III
Identification
98.849
99.091
99.286
99.712
99.8
99.819
100
100
100
Verification
0.361
0.364
0.286
0.304
0.201
0.182
0.018
6.32e-4
1.44e-3
We get the same result when using CNN features, as shown in Table 6. One may notice that SRC gets better rank-1 accuracy on GT dataset, but SRC_sce obtains slightly worse EER result than that obtained on AR dataset. However, looks on the CMC curves in Fig. 13, one can see that the curve on AR dataset converges to 100% accuracy faster than the curve on GT dataset. In other words, the overall identification performance of SRC on AR dataset is better than that obtained on GT dataset. It should also be noted that although SRC_sce achieves slightly inferior EER (0.364%) on USTB III (50) than the 0.361% EER on USTB III, the ROC curves in Fig. 10 show that it gets better verification performance on the former subset.
As for the SRC-based multimodal identification, when using DCT features all the CMC curves ascend rapidly to 100% in the first few ranks, while in the experiments with CNN features, SRC achieves 100% rank-1 accuracy on all multimodal datasets. This implies a favorable inter-class separability on each multimodal dataset. Correspondingly, we can see the ROC curves of multimodal SRC_sce converge to 0% FRR very soon and they even lay on the axes when using CNN features, as shown in Figs. 11 and 12.
As for the comparison between DCT and CNN experiments, we can see from Fig. 13 that when using CNN features, all the unimodal and multimodal samples are identified correctly at rank 6 on any datasets, while there are still many unimodal probe samples which cannot be identified after rank 6 in the DCT features. This indicates again that CNN feature space shows much better inter-class separability, which is in line with the existing literature. As a result, the ROC curves got by SRC-based unimodal verification decline slowly and cannot converge to 0% FAR, showing evident FRR bottleneck problem. On the contrary, the SRC-based verification using CNN features achieves significant superiority in both unimodal and multimodal scenarios and on all the datasets.
The above experimental results demonstrate that SRC-based verification can achieve better performance in the scenarios where better identification performance is achieved by SRC. Apparently, there is a positive correlation between the performance of SRC-based verification and the inter-class separability of samples in the coding dictionary. This characteristic could be used as a guideline for SRC-based verification applications. The SRC-based verification may not be suitable for the biometrics or scenarios where inferior inter-class separability is available. In our study, compared with the face, palmprint, and ear unimodal verification, their combinations are more recommendable to use SRC-based verification.

Small random dictionary

We evaluate the effectiveness and efficiency of small random dictionary on MD II and MD III using the unimodal and multimodal SRC_sce with both DCT and CNN features. On each dataset, we test 10 small random dictionaries with 50 subjects. Figure 14 uses boxplot to illustrate the EER distributions of the SRC_sce with small random dictionaries. A red diamond marker denotes the EER result of SRC_sce using the full dictionary with all subjects on the same dataset. We can see from Fig. 14 that in each case, all the small random dictionaries achieve similar EER results with a small variance. In both the unimodal and multimodal experiments with DCT features, most small random dictionaries are better than or comparable to the full dictionary, and the remainders are slightly worse. When using CNN features, the superiority of small random dictionaries to full dictionary is much more evident.
Table 7 reports the average running time per sample of SRC_sce unimodal verification on AR and PolyU datasets. We can see that the verification using small random dictionary is much more efficient than when using a full dictionary. Note that the time consumption of our multimodal verification using Sum fusion is about 2 times of that used by unimodal verification.
Table 7
The average running time (sec) per sample of SRC-based verification with different dictionary scales when using CNN features on MD III
Scales
AR
PolyU
100 subjects
0.807
0.799
50 subjects
0.258
0.277
In this series of experiments, we can see that compared with the full dictionary, the SRC-based verification using small random dictionaries generally achieve better or comparable verification results. Considering the large user scale in real-world applications, the SRC-based verification using the dictionary with all the enrolled subjects will inevitably subject to accuracy degradation and a heavy or even unaffordable computational cost. Therefore, shrinking the big dictionary to a certain scale is indispensable in large-scale user applications.

Conclusions and future directions

In this paper, we have first given an insight into SRC-based biometric verification by studying two sparsity-based matching measures on three biometric traits and their multimodal combinations, using both handcrafted and deep CNN features. The sparse coding in SRC-based verification can be seen as a one-to-many competitive matching process where the client has to compete with non-target subjects for a convincing sparsity-based matching score. Essentially, SRC-based verification not only examines the matching score obtained by the client but also implicitly compares the correlations of the query data to a limited number of the non-target subjects, and thereby offers enhanced protection for identity security. Extensive experimental results demonstrate that in both unimodal and multimodal scenarios, SRC-based verification achieves overwhelming superiority to many well-known methods based on the one-to-one matching and cosine similarity, especially when using multimodal fusion and CNN features.
The foremost concern about SRC-based verification is that if the genuine class fails to get a top rank in encoding the query data when data degeneration occurs, it is very likely to result in extremely inferior genuine score, and consequently lead to long tail effect of ROC curve. We call this effect as FRR bottleneck problem. If this problem cannot be resolved or avoided properly, SRC-based verification may not be suitable for some biometrics and the application scenarios requiring very low FRR rate and high user acceptability. This problem is particularly prominent in the unimodal verification using DCT features. On the contrary, we did not see this effect when using advanced deep CNN features and multimodal combinations on all the three multimodal datasets.
We also found that there is a strong correlation between the performance of SRC-based verification and the inter-class separability among the classes in the coding dictionary. Hence, whether the coding dictionary has well-separated feature distribution is critical. The SRC-based verification is rather suitable for the application scenarios where favorable inter-class separability is available, such as multimodal biometrics, and when using discriminative deep learning features. On the other side, SRC-based verification may not be suitable for all biometric applications. One can reckon its feasibility according to the existing relevant SRC-based identification studies.
Another major challenge lies in that, owing to the utilization of non-target subjects, a large-scale of the coding dictionary will definitely bring huge computational burden, and it is also likely to degrade the verification accuracy. In the large-scale user applications, we suggest one select a properly small subset of non-target subjects with well-separated feature distribution. In our experiments, a simple dictionary shrinkage strategy based on cluster analysis and random selection of non-target subjects can generally improve verification accuracy while maintaining high efficiency.
The introduction of non-target subjects may also raise a concern about increasing the number of vulnerabilities that can be explored by intruders. One may worry that intruders may deceive an SRC-based verification system using the biometric traits stolen from non-target subjects. We would like to emphasize again that the class-specific sparsity-based matching score used for verification comparison is the one associated with the identity claimed. According to the characteristics of SRC, the corresponding class is very likely to be assigned an inferior score and thus be correctly rejected. On the other hand, if the non-target subjects never exist, for example, using the virtual individuals created by GAN models, there will not be any chances of stealing except for breaking into the system database.
In the future, more efforts can be made on the following aspects:
  • Sparsity-based matching measures. The existing sparsity-based matching measures use either coding coefficients or reconstruction residues. However, in practice, the L1-norm optimization often need to make a compromise between the reconstruction fidelity and sparseness, especially when the input data has a low biometric quality. It may be a direction to integrate the discriminative cues in both coding coefficients and reconstruction residues to design more robust sparsity-based matching measures.
  • Multimodal verification. Recent studies shown that even the SOTA deep learning-based face verification approaches are highly vulnerable to some low-level print and replay presentation attacks [10, 11]. Many more unpredictable advanced attacks will emerge in the near future. In view of the fact that people’s face images can be easily obtained or stolen, it seems very difficult to eliminate the vulnerability of the unimodal face recognition to presentation attacks. On the other hand, it will be much more difficult to fool a multimodal system using the face and other biometric traits that are harder to steal and can also be recognized contactless. Our study in this paper suggests that the SRC-based multimodal verification using deep learning features can achieve high accuracy and meanwhile avoid some shortcomings. We believe that the combination of deep learning features, multiple biometrics, and SRC classification techniques can bring about a good trade-off between verification accuracy and security.
  • Large-scale datasets. One major challenge at the moment is that there are no suitable large-scale datasets available for SRC-based verification research. Note that SRC requires sufficient well-controlled training samples per user if directly using them or their features to build the overcomplete dictionary. However, most of the publicly available large-scale datasets are collected in unconstrained environment. It is also cumbersome and expensive to collect a large-scale dataset with sufficiently well-controlled samples per user if without industry support. A more efficient way is to select suitable data from many existing large-scale datasets. Another alternative is to use data generation techniques like GAN and 3D face models to produce multiple simulated samples with variants like pose and illumination based on users’ enrolled data. The dictionary augmentation skills like supplementing an intra-class variation dictionary in Refs. [1823] can also helpful to alleviate the under-sampled problem, and thus reduce the requirements for the use of SRC-based verification in large-scale datasets.

Acknowledgements

This work was supported in part by National Natural Science Foundation of China under Grant Nos 61602390, 61876072, 61860206007, U19A2071, the Major Project of National Social Science Foundation of China (Grant 21&ZD166), and the Xihua University Funds for Young Scholar.

Declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. The authors declare that there are no conflicts of interest regarding the publication of this paper.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Literature
1.
go back to reference Jain AK, Flynn P, Ross AA (2007) Handbook of biometrics. Springer, New York Jain AK, Flynn P, Ross AA (2007) Handbook of biometrics. Springer, New York
2.
go back to reference Li S, Jain AK (2004) Handbook of face recognition. Springer, New YorkMATH Li S, Jain AK (2004) Handbook of face recognition. Springer, New YorkMATH
3.
go back to reference Huang Z, Feng Z, Kittler J, Liu Y (2018) Improve the spoofing resistance of multimodal verification with representation-based measures. In: The first Chinese conference on pattern recognition and computer vision, Guangzhou Huang Z, Feng Z, Kittler J, Liu Y (2018) Improve the spoofing resistance of multimodal verification with representation-based measures. In: The first Chinese conference on pattern recognition and computer vision, Guangzhou
4.
go back to reference Ross A, Nandakumar K, Jain AK (2006) Handbook of multibiometrics. Springer, New York Ross A, Nandakumar K, Jain AK (2006) Handbook of multibiometrics. Springer, New York
5.
go back to reference Zhang L, Yang M, Feng X (2011) Sparse representation or collaborative representation: which helps face recognition? In: 2011 International conference on computer vision, Barcelona Zhang L, Yang M, Feng X (2011) Sparse representation or collaborative representation: which helps face recognition? In: 2011 International conference on computer vision, Barcelona
6.
go back to reference Wang M, Deng W (2021) Deep face recognition: a survey. Neurocomputing 429:215–244 Wang M, Deng W (2021) Deep face recognition: a survey. Neurocomputing 429:215–244
8.
go back to reference Huang GB, Ramesh M, Berg, Learned-Miller E (2007) Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical Report 07–49, University of Massachusetts, Amherst, Tech. Rep Huang GB, Ramesh M, Berg, Learned-Miller E (2007) Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical Report 07–49, University of Massachusetts, Amherst, Tech. Rep
9.
go back to reference Kemelmacher-Shlizerman I, Seitz S M, Miller D, Brossard E (2016) The megaface benchmark: 1 million faces for recognition at scale. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4873–4882 Kemelmacher-Shlizerman I, Seitz S M, Miller D, Brossard E (2016) The megaface benchmark: 1 million faces for recognition at scale. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4873–4882
10.
go back to reference Mohammadi A, Bhattacharjee S, Marcel S (2018) Deeply vulnerable: a study of the robustness of face recognition to presentation attacks. IET Biometrics 7(1):15–26 Mohammadi A, Bhattacharjee S, Marcel S (2018) Deeply vulnerable: a study of the robustness of face recognition to presentation attacks. IET Biometrics 7(1):15–26
11.
go back to reference Fang M, Damer N, Kirchbuchner F, Kuijper A (2022) Real masks and spoof faces: on the masked face presentation attack detection. Pattern Recogn 123:108398 Fang M, Damer N, Kirchbuchner F, Kuijper A (2022) Real masks and spoof faces: on the masked face presentation attack detection. Pattern Recogn 123:108398
14.
go back to reference Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227 Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227
15.
go back to reference Yang M, Zhang L, Yang J, Zhang D (2012) Regularized robust coding for face recognition. IEEE Trans Image Process 22(5):1753–1766MathSciNetMATH Yang M, Zhang L, Yang J, Zhang D (2012) Regularized robust coding for face recognition. IEEE Trans Image Process 22(5):1753–1766MathSciNetMATH
17.
go back to reference Liao M, Gu X (2020) Face recognition approach by subspace extended sparse representation and discriminative feature learning. Neurocomputing 373:35–49 Liao M, Gu X (2020) Face recognition approach by subspace extended sparse representation and discriminative feature learning. Neurocomputing 373:35–49
20.
go back to reference Deng W, Hu J, Guo J (2018) Face recognition via collaborative representation: its discriminant nature and superposed representation. IEEE Trans Pattern Anal Mach Intell 40(10):2513–2521 Deng W, Hu J, Guo J (2018) Face recognition via collaborative representation: its discriminant nature and superposed representation. IEEE Trans Pattern Anal Mach Intell 40(10):2513–2521
21.
go back to reference Aharon M, Elad M, Bruckstein A (2006) K-SVD: an algorithm for designing over-complete dictionaries for sparse representation. IEEE SP 54(11):4311–4322MATH Aharon M, Elad M, Bruckstein A (2006) K-SVD: an algorithm for designing over-complete dictionaries for sparse representation. IEEE SP 54(11):4311–4322MATH
22.
go back to reference Yang M, Wang X, Zeng G, Shen L (2016) Joint and collaborative representation with local adaptive convolution feature for face recognition with single sample per person. Pattern Recogn 66(C):117–128 Yang M, Wang X, Zeng G, Shen L (2016) Joint and collaborative representation with local adaptive convolution feature for face recognition with single sample per person. Pattern Recogn 66(C):117–128
23.
go back to reference Wang X, Zhang B, Yang M, Ke K, Zheng W (2019) Robust joint representation with triple local feature for face recognition with single sample per person. Knowl-Based Syst 181:104790 Wang X, Zhang B, Yang M, Ke K, Zheng W (2019) Robust joint representation with triple local feature for face recognition with single sample per person. Knowl-Based Syst 181:104790
24.
go back to reference Vo DM, Lee SW (2018) Robust face recognition via hierarchical collaborative representation. Inf Sci 432:332–346MathSciNetMATH Vo DM, Lee SW (2018) Robust face recognition via hierarchical collaborative representation. Inf Sci 432:332–346MathSciNetMATH
25.
go back to reference Xu J, An W, Zhang L, Zhang D (2019) Sparse, collaborative, or nonnegative representation: which helps pattern classification? Pattern Recogn 88:679–688 Xu J, An W, Zhang L, Zhang D (2019) Sparse, collaborative, or nonnegative representation: which helps pattern classification? Pattern Recogn 88:679–688
26.
go back to reference Wang W, Tang C, Wang X, Luo Y, Li J (2019) Image object recognition via deep feature-based adaptive joint sparse representation. Comput Intell Neurosci 2019(2):1–9 Wang W, Tang C, Wang X, Luo Y, Li J (2019) Image object recognition via deep feature-based adaptive joint sparse representation. Comput Intell Neurosci 2019(2):1–9
27.
go back to reference Kua J, Ambikairajah E, Epps J, Togneri R (2011) Speaker verification using sparse representation classification. In: Proc. ICASSP, pp 4548–4551 Kua J, Ambikairajah E, Epps J, Togneri R (2011) Speaker verification using sparse representation classification. In: Proc. ICASSP, pp 4548–4551
28.
go back to reference Li M, Zhang X, Yan Y, Narayanan S (2011) Speaker verification using sparse representations on total variability i-vectors. In: 12th annual conference of the international speech communication association, Florence, pp 2729–2732 Li M, Zhang X, Yan Y, Narayanan S (2011) Speaker verification using sparse representations on total variability i-vectors. In: 12th annual conference of the international speech communication association, Florence, pp 2729–2732
29.
go back to reference Kua JMK, Epps J, Ambikairajah E (2013) i-Vector with sparse representation classification for speaker verification. Speech Comm 55(5):707–720 Kua JMK, Epps J, Ambikairajah E (2013) i-Vector with sparse representation classification for speaker verification. Speech Comm 55(5):707–720
30.
go back to reference Hasheminejad M, Farsi H (2017) Frame level sparse representation classification for speaker verification. Multimed Tools Appl 76:21211–21224 Hasheminejad M, Farsi H (2017) Frame level sparse representation classification for speaker verification. Multimed Tools Appl 76:21211–21224
31.
go back to reference Xin Y, Liu Z, Zhang HX, Zhang H (2012) Finger vein verification system based on sparse representation. Appl Opt 51(25):6252–6258 Xin Y, Liu Z, Zhang HX, Zhang H (2012) Finger vein verification system based on sparse representation. Appl Opt 51(25):6252–6258
32.
go back to reference Shin W, Lee S, Min H, Hosik S, Ro Y (2013) Face verification using color sparse representation. Lect Notes Comput Sci 7809:290–299 Shin W, Lee S, Min H, Hosik S, Ro Y (2013) Face verification using color sparse representation. Lect Notes Comput Sci 7809:290–299
33.
go back to reference Huang Z, Liu Y, Li C, Yang M, Chen L (2013) A robust face and ear based multimodal biometric system using sparse representation. Pattern Recogn 46(8):2156–2168 Huang Z, Liu Y, Li C, Yang M, Chen L (2013) A robust face and ear based multimodal biometric system using sparse representation. Pattern Recogn 46(8):2156–2168
34.
go back to reference Huang Z, Liu Y, Li X, Li J (2015) An adaptive bimodal recognition framework using sparse coding for face and ear. Pattern Recogn Lett 53(1):69–76 Huang Z, Liu Y, Li X, Li J (2015) An adaptive bimodal recognition framework using sparse coding for face and ear. Pattern Recogn Lett 53(1):69–76
35.
go back to reference Poh N, Kittler J (2012) A unified framework for biometric expert fusion incorporating quality measures. IEEE Trans Pattern Anal Mach Intell 34(1):3–18 Poh N, Kittler J (2012) A unified framework for biometric expert fusion incorporating quality measures. IEEE Trans Pattern Anal Mach Intell 34(1):3–18
37.
go back to reference Shao C, Song X, Feng ZH, Wu XJ, Zheng Y (2017) Dynamic dictionary optimization for sparse-representation-based face classification using local difference images. Inf Sci 393:1–14 Shao C, Song X, Feng ZH, Wu XJ, Zheng Y (2017) Dynamic dictionary optimization for sparse-representation-based face classification using local difference images. Inf Sci 393:1–14
38.
go back to reference Nandakumar K, Chen Y, Dass SC, Jain AK (2008) Likelihood ratio-based biometric score fusion. IEEE Trans Pattern Anal Mach Intell 30(2):342–347 Nandakumar K, Chen Y, Dass SC, Jain AK (2008) Likelihood ratio-based biometric score fusion. IEEE Trans Pattern Anal Mach Intell 30(2):342–347
39.
go back to reference Yang J, Zhang L, Xu Y, Yang JY (2012) Beyond sparsity: the role of -optimizer in pattern classification. Pattern Recogn 45(3):1104–1118MATH Yang J, Zhang L, Xu Y, Yang JY (2012) Beyond sparsity: the role of -optimizer in pattern classification. Pattern Recogn 45(3):1104–1118MATH
40.
go back to reference Yang M, Zhang L, Shiu S, Zhang D (2013) Gabor feature based robust representation and classification for face recognition with Gabor occlusion dictionary. Pattern Recogn 46(7):1865–1878 Yang M, Zhang L, Shiu S, Zhang D (2013) Gabor feature based robust representation and classification for face recognition with Gabor occlusion dictionary. Pattern Recogn 46(7):1865–1878
43.
go back to reference Lai J, Jiang X (2012) Modular weighted global sparse representation for robust face recognition. IEEE Signal Process Lett 19(9):571–574 Lai J, Jiang X (2012) Modular weighted global sparse representation for robust face recognition. IEEE Signal Process Lett 19(9):571–574
48.
go back to reference Xu Y, Zhu X, Li Z, Liu G, Lu Y, Liu H (2013) Using the original and ‘symmetrical face’ training samples to perform representation based two-step face recognition. Pattern Recogn 46(4):1151–1158 Xu Y, Zhu X, Li Z, Liu G, Lu Y, Liu H (2013) Using the original and ‘symmetrical face’ training samples to perform representation based two-step face recognition. Pattern Recogn 46(4):1151–1158
50.
go back to reference Verlinde P, Cholet G (1999) Comparing decision fusion paradigms using k-NN based classifiers, decision trees and logistic regression in a multi-modal identity verification application. In: AVBPA, pp 188–193 Verlinde P, Cholet G (1999) Comparing decision fusion paradigms using k-NN based classifiers, decision trees and logistic regression in a multi-modal identity verification application. In: AVBPA, pp 188–193
51.
go back to reference Merati A, Poh N, Kittler N (2012) User-specific cohort selection and score normalization for biometric systems. IEEE Trans Inf Forensics Secur 7(4):1270–1277 Merati A, Poh N, Kittler N (2012) User-specific cohort selection and score normalization for biometric systems. IEEE Trans Inf Forensics Secur 7(4):1270–1277
52.
go back to reference Zuo W, Lin Z, Guo Z, Zhang D (2010) The multiscale competitive code via sparse representation for palmprint verification. In: IEEE conf. on computer vision and pattern recognition, pp 2265–2272 Zuo W, Lin Z, Guo Z, Zhang D (2010) The multiscale competitive code via sparse representation for palmprint verification. In: IEEE conf. on computer vision and pattern recognition, pp 2265–2272
53.
go back to reference Haris BC, Rohit S (2012) Sparse representation over learned and discriminatively learned dictionaries for speaker verification. In: Proc. ICASSP, pp 4785–4788 Haris BC, Rohit S (2012) Sparse representation over learned and discriminatively learned dictionaries for speaker verification. In: Proc. ICASSP, pp 4785–4788
54.
go back to reference Kumar A, Chan TS (2013) Robust ear identification using sparse representation of local texture descriptors. Pattern Recogn 46(1):73–85 Kumar A, Chan TS (2013) Robust ear identification using sparse representation of local texture descriptors. Pattern Recogn 46(1):73–85
55.
go back to reference Wright J, Ma Y, Mairal J, Sapiro G, Huang TS, Yan S (2010) Sparse representation for computer vision and pattern recognition. Proc IEEE 98(6):1031–1044 Wright J, Ma Y, Mairal J, Sapiro G, Huang TS, Yan S (2010) Sparse representation for computer vision and pattern recognition. Proc IEEE 98(6):1031–1044
57.
go back to reference Martinez AM, Benavente R (1998) The AR Face Database, CVC Technical Report 24 Martinez AM, Benavente R (1998) The AR Face Database, CVC Technical Report 24
58.
go back to reference Li W, Zhang L, Zhang D, Lu G, Yan J (2010) Efficient joint 2d and 3d palmprint matching with alignment refinement. In: IEEE conference on computer vision and pattern recognition, San Francisco Li W, Zhang L, Zhang D, Lu G, Yan J (2010) Efficient joint 2d and 3d palmprint matching with alignment refinement. In: IEEE conference on computer vision and pattern recognition, San Francisco
59.
go back to reference Li W, Zhang D, Zhang L, Lu G, Yan J (2011) 3-d palmprint recognition with joint line and orientation features. IEEE Trans Syst Man Cybern Part C 41(2):274–279 Li W, Zhang D, Zhang L, Lu G, Yan J (2011) 3-d palmprint recognition with joint line and orientation features. IEEE Trans Syst Man Cybern Part C 41(2):274–279
61.
go back to reference Kim SJ, Koh K, Lustig M, Boyd S, Gorinevsky D (2007) An interior-point method for large-scale l1 -regularized least squares. IEEE J Sel Top Sign Proces 1(4):606–617 Kim SJ, Koh K, Lustig M, Boyd S, Gorinevsky D (2007) An interior-point method for large-scale l1 -regularized least squares. IEEE J Sel Top Sign Proces 1(4):606–617
62.
go back to reference Cheng H, Liu Z, Yang L, Chen X (2013) Sparse representation and learning in visual recognition: theory and applications. Signal Process 93(6):1408–1425 Cheng H, Liu Z, Yang L, Chen X (2013) Sparse representation and learning in visual recognition: theory and applications. Signal Process 93(6):1408–1425
63.
go back to reference Wright J, Ma Y (2011) Dense error correction via -minimization. IEEE Trans Inform Theory 56(7):3540–3560MathSciNetMATH Wright J, Ma Y (2011) Dense error correction via -minimization. IEEE Trans Inform Theory 56(7):3540–3560MathSciNetMATH
64.
go back to reference Song X, Feng ZH, Hu G, Kittler J, Wu XJ (2018) Dictionary integration using 3d morphable face models for pose-invariant collaborative representation-based classification. IEEE Trans Inf Forensics Secur 13(11):2734–2745 Song X, Feng ZH, Hu G, Kittler J, Wu XJ (2018) Dictionary integration using 3d morphable face models for pose-invariant collaborative representation-based classification. IEEE Trans Inf Forensics Secur 13(11):2734–2745
65.
go back to reference Goodfellow IJ, Pouget-Abadie J, Mirza M et al (2014) Generative adversarial networks. Adv Neural Inf Process Syst 3:2672–2680 Goodfellow IJ, Pouget-Abadie J, Mirza M et al (2014) Generative adversarial networks. Adv Neural Inf Process Syst 3:2672–2680
67.
go back to reference Dinca LM, Hancke G (2017) The fall of one, the rise of many: a survey on multi-biometric fusion methods. IEEE Access 5:6247–6289 Dinca LM, Hancke G (2017) The fall of one, the rise of many: a survey on multi-biometric fusion methods. IEEE Access 5:6247–6289
68.
go back to reference Huang Z, Liu Y, Huang R, Yang M (2013) Frameworks for multimodal biometric using sparse coding. Lecture Notes Comput Sci Article ID 7751, 433–440 Huang Z, Liu Y, Huang R, Yang M (2013) Frameworks for multimodal biometric using sparse coding. Lecture Notes Comput Sci Article ID 7751, 433–440
69.
go back to reference Abaza A, Ross A, Hebert C, Harrison M, Nixon MS (2010) A survey on ear biometrics. ACM Trans Embedded Comput Syst 9(4), Article 39 Abaza A, Ross A, Hebert C, Harrison M, Nixon MS (2010) A survey on ear biometrics. ACM Trans Embedded Comput Syst 9(4), Article 39
70.
go back to reference Kittler J, Hatef M, Duin RPW et al (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3):226–239 Kittler J, Hatef M, Duin RPW et al (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3):226–239
71.
go back to reference Liu Y, You Z, Cao L (2006) A novel and quick SVM-based multi-class class. Pattern Recogn 39(11):2258–2264MATH Liu Y, You Z, Cao L (2006) A novel and quick SVM-based multi-class class. Pattern Recogn 39(11):2258–2264MATH
Metadata
Title
A study of sparse representation-based classification for biometric verification based on both handcrafted and deep learning features
Authors
Zengxi Huang
Jie Wang
Xiaoming Wang
Xiaoning Song
Mingjin Chen
Publication date
22-09-2022
Publisher
Springer International Publishing
Published in
Complex & Intelligent Systems / Issue 2/2023
Print ISSN: 2199-4536
Electronic ISSN: 2198-6053
DOI
https://doi.org/10.1007/s40747-022-00868-6

Other articles of this Issue 2/2023

Complex & Intelligent Systems 2/2023 Go to the issue

Premium Partner