Fast features for face authentication under illumination direction changes
Introduction
The field of face recognition can be divided into two areas: face identification and face verification (also known as authentication). A face verification system verifies the claimed identity based on images (or a video sequence) of the claimant’s face; this is in contrast to an identification system, which attempts to find the identity of a given person out of a pool of N people.
Verification systems pervade our every day life; for example, Automatic Teller Machines (ATMs) employ simple identity verification where the user is asked to enter their password (known only to the user), after inserting their ATM card; if the password matches the one prescribed to the card, the user is allowed access to their bank account. However, the verification system such as the one used in the ATM only verifies the validity of the combination of a certain possession (in this case, the ATM card) and certain knowledge (the password). The ATM card can be lost or stolen, and the password can be compromised (e.g. somebody looks over your shoulder while you’re keying it in). In order to address this issue, biometric verification methods have emerged where the password can be either replaced by, or used in addition to, biometrics such as the person’s speech, face image or fingerprints. More information about the field of biometrics can be found in papers by Bolle et al. (2002), Dugelay et al. (2002) and Woodward (1997).
Generally speaking, a full face recognition system can be thought of as being comprised of three stages:
- 1.
Face localization and segmentation
- 2.
Normalization
- 3.
The actual face identification/verification, which can be further subdivided into:
- (a)
Feature extraction
- (b)
Classification
- (a)
The second stage (normalization) usually involves an affine transformation (Gonzales and Woods, 1993) (to correct for size and rotation), but it can also involve an illumination normalization (however, illumination normalization may not be necessary if the feature extraction method is robust against varying illumination). In this letter we shall concentrate on the feature extraction part of the last stage.
There are many approaches to face based systems, ranging from the ubiquitous Principal Component Analysis (PCA) approach (also known as eigenfaces) (Turk and Pentland, 1991), Dynamic Link Architecture (also known as elastic graph matching) (Duc et al., 1999), Artificial Neural Networks (Lawrence et al., 1997), to pseudo-2D Hidden Markov Models (HMM) (Samaria, 1994; Eickeler et al., 2000). Recent surveys on face recognition can be found in papers by Chellappa et al. (1995), Zhang et al. (1997) and Grudin (2000).
The above-mentioned systems differ in terms of the feature extraction procedure and/or the classification technique used. For example, Turk and Pentland (1991) used PCA for feature extraction and a nearest neighbour classifier for recognition. Duc et al. (1999) used biologically inspired 2D Gabor wavelets (Lee, 1996) for feature extraction, while employing the Dynamic Link Architecture as part of the classifier. Eickeler et al. (2000) obtained features using the 2D Discrete Cosine Transform (DCT) and used the pseudo-2D HMM as the classifier.
PCA derived features have been shown to be sensitive to changes in the illumination direction (Belhumeur et al., 1997) causing rapid degradation in verification performance. A study by Zhang et al. (1997) has shown a system employing 2D Gabor wavelet derived features to be robust to moderate changes in the illumination direction; however, Adini et al. (1997) showed that the 2D Gabor wavelet derived features are sensitive to gross changes in the illumination direction.
Belhumeur et al. (1997) proposed robust features based on Fisher’s Linear Discriminant; however, to achieve robustness, the system required face images with varying illumination for training purposes.
As will be shown, 2D DCT based features are also sensitive to changes in the illumination direction. In this letter we introduce four new techniques, which are significantly less affected by an illumination direction change: DCT-delta, DCT-mod, DCT-mod-delta and DCT-mod2. We will show that the DCT-mod2 method, which utilizes polynomial coefficients derived from 2D DCT coefficients of spatially neighbouring blocks, is the most suitable. We then compare the robustness and performance of the DCT-mod2 method against three popular feature extraction techniques: eigenfaces (PCA), PCA with histogram equalization and 2D Gabor wavelets.
The rest of the letter is organized as follows. In Section 2 we briefly review the 2D DCT feature extraction technique and describe the proposed feature extraction methods which build from the 2D DCT. In Section 3 we describe a Gaussian Mixture Model (GMM) based classifier which shall be used as the basis for experiments. The performance of the traditional and proposed feature extraction techniques is compared in Section 4, using an artificial illumination direction change. Section 5 is devoted to experiments on the Weizmann database (Adini et al., 1997) which has more realistic illumination direction changes.
To keep consistency with traditional matrix notation, pixel locations (and image sizes) are described using the row(s) first, followed by the column(s).
Section snippets
2D discrete cosine transform (DCT)
Here the given face image is analyzed on a block by block basis. Given an image block f(y,x), where y,x=0,1,…,NP−1 (here we use NP=8), we decompose it in terms of orthogonal 2D DCT basis functions (see Fig. 1). The result is an NP×NP matrix C(v,u) containing 2D DCT coefficients:for v,u=0,1,2,…,NP−1, whereandThe coefficients are ordered according to a zig–zag pattern, reflecting
GMM based classifier
Given a claim for person C’s identity and a set of feature vectors supporting the claim, the average log likelihood of the claimant being the true claimant is calculated using:Here, is a D-dimensional Gaussian function with mean and diagonal covariance matrix :λC is the parameter set for person C, NG is the number of Gaussians and mj is
VidTIMIT audio-visual database
The VidTIMIT database (Sanderson, 2002), is comprised of video and corresponding audio recordings of 43 people (19 female and 24 male), reciting short sentences. It was recorded in 3 sessions, with a mean delay of 7 days between Session 1 and 2, and 6 days between Sessions 2 and 3. There are 10 sentences per person; the first six sentences are assigned to Session 1; the next two sentences are assigned to Session 2 with the remaining two to Session 3. The mean duration of each sentence is 4.25
Experiments on the Weizmann database
The experiments described in Section 4 utilized an artificial illumination direction change. In this section we shall compare the performance of 2D DCT, 2D Gabor and DCT-mod2 feature sets on the Weizmann database (Adini et al., 1997), which has more realistic illumination direction changes.
It must be noted that the database is rather small, as it is comprised of images of 27 people; moreover, for the direct frontal view, there is only one image per person with uniform illumination (the training
Conclusion
In this letter we proposed four new facial feature extraction techniques, which are resistant the effects of illumination direction changes; out of the proposed methods, the DCT-mod2 technique, which utilizes polynomial coefficients derived from 2D DCT coefficients of spatially neighbouring blocks, is the most suitable. Face verification results on the VidTIMIT database suggest that the DCT-mod2 feature set is superior (in terms of robustness to illumination direction changes and discrimination
References (32)
- et al.
Biometric perils and patches
Pattern Recognit.
(2002) - et al.
Why recognition in a statistics-based face recognition system should be based on the pure face portion: a probabilistic decision-based proof
Pattern Recognit.
(2001) - et al.
Recognition of JPEG compressed face images based on statistical methods
Image Vision Comput.
(2000) Recent advances in speaker recognition
Pattern Recognition Lett.
(1997)On internal representations in face recognition systems
Pattern Recognit.
(2000)- et al.
Morphological elastic graph matching applied to frontal face authentication under well-controlled and real conditions
Pattern Recognit.
(2000) Speaker identification and verification using Gaussian Mixture Speaker Models
Speech Commun.
(1995)- et al.
Speaker verification using adapted gaussian mixture models
Digital Signal Process.
(2000) - et al.
Face recognition: The problem of compensating for changes in illumination direction
IEEE Trans. Pattern Anal. Machine Intell.
(1997) - et al.
Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection
IEEE Trans. Pattern Anal. Machine Intell.
(1997)
Digital Image Processing
Human and machine recognition of faces: a survey
Proc. IEEE
Maximum likelihood from incomplete data via the EM algorithm
J. Roy. Statist. Soc. Ser. B
Face authentication with Gabor information on deformable graphs
IEEE Trans. Image Process.
Pattern Classification
Cited by (114)
Non-stationary feature fusion of face and palmprint multimodal biometrics
2016, NeurocomputingCitation Excerpt :In these methods, local features can be extracted from the image on a block-by-block basis where the coefficients in the low frequency band are reserved for representation. A successful method based on spatial frequency was developed based on statistical model such as a Gaussian Mixture Model (GMM) [41,42] and Hidden Markov Model (HMM) [15,41]. A statistical model can learn the distribution of coefficients in all sub block images whereby each image from the same class will have its own density distribution.
Data-Driven Advancements in Lip Motion Analysis: A Review
2023, Electronics (Switzerland)Automated Dataset Collection Pipeline for Lip Motion Authentication
2023, IS and T International Symposium on Electronic Imaging Science and TechnologyFERCE: Facial Expression Recognition for Combined Emotions Using FERCE Algorithm
2022, IETE Journal of ResearchAn efficient human face verification system based on ELBP: a high precision feature
2021, Journal of Ambient Intelligence and Humanized Computing