Fast features for face authentication under illumination direction changes

doi:10.1016/S0167-8655(03)00070-9

Pattern Recognition Letters

Volume 24, Issue 14, October 2003, Pages 2409-2419

https://doi.org/10.1016/S0167-8655(03)00070-9 Get rights and content

Abstract

In this letter we propose a facial feature extraction technique which utilizes polynomial coefficients derived from 2D Discrete Cosine Transform (DCT) coefficients obtained from horizontally and vertically neighbouring blocks. Face authentication results on the VidTIMIT database suggest that the proposed feature set is superior (in terms of robustness to illumination changes and discrimination ability) to features extracted using four popular methods: Principal Component Analysis (PCA), PCA with histogram equalization pre-processing, 2D DCT and 2D Gabor wavelets; the results also suggest that histogram equalization pre-processing increases the error rate and offers no help against illumination changes. Moreover, the proposed feature set is over 80 times faster to compute than features based on Gabor wavelets. Further experiments on the Weizmann database also show that the proposed approach is more robust than 2D Gabor wavelets and 2D DCT coefficients.

Introduction

The field of face recognition can be divided into two areas: face identification and face verification (also known as authentication). A face verification system verifies the claimed identity based on images (or a video sequence) of the claimant’s face; this is in contrast to an identification system, which attempts to find the identity of a given person out of a pool of N people.

Verification systems pervade our every day life; for example, Automatic Teller Machines (ATMs) employ simple identity verification where the user is asked to enter their password (known only to the user), after inserting their ATM card; if the password matches the one prescribed to the card, the user is allowed access to their bank account. However, the verification system such as the one used in the ATM only verifies the validity of the combination of a certain possession (in this case, the ATM card) and certain knowledge (the password). The ATM card can be lost or stolen, and the password can be compromised (e.g. somebody looks over your shoulder while you’re keying it in). In order to address this issue, biometric verification methods have emerged where the password can be either replaced by, or used in addition to, biometrics such as the person’s speech, face image or fingerprints. More information about the field of biometrics can be found in papers by Bolle et al. (2002), Dugelay et al. (2002) and Woodward (1997).

Generally speaking, a full face recognition system can be thought of as being comprised of three stages:

1.
Face localization and segmentation
2.
Normalization
3.
The actual face identification/verification, which can be further subdivided into:
- (a)
  Feature extraction
- (b)
  Classification

The second stage (normalization) usually involves an affine transformation (Gonzales and Woods, 1993) (to correct for size and rotation), but it can also involve an illumination normalization (however, illumination normalization may not be necessary if the feature extraction method is robust against varying illumination). In this letter we shall concentrate on the feature extraction part of the last stage.

There are many approaches to face based systems, ranging from the ubiquitous Principal Component Analysis (PCA) approach (also known as eigenfaces) (Turk and Pentland, 1991), Dynamic Link Architecture (also known as elastic graph matching) (Duc et al., 1999), Artificial Neural Networks (Lawrence et al., 1997), to pseudo-2D Hidden Markov Models (HMM) (Samaria, 1994; Eickeler et al., 2000). Recent surveys on face recognition can be found in papers by Chellappa et al. (1995), Zhang et al. (1997) and Grudin (2000).

The above-mentioned systems differ in terms of the feature extraction procedure and/or the classification technique used. For example, Turk and Pentland (1991) used PCA for feature extraction and a nearest neighbour classifier for recognition. Duc et al. (1999) used biologically inspired 2D Gabor wavelets (Lee, 1996) for feature extraction, while employing the Dynamic Link Architecture as part of the classifier. Eickeler et al. (2000) obtained features using the 2D Discrete Cosine Transform (DCT) and used the pseudo-2D HMM as the classifier.

PCA derived features have been shown to be sensitive to changes in the illumination direction (Belhumeur et al., 1997) causing rapid degradation in verification performance. A study by Zhang et al. (1997) has shown a system employing 2D Gabor wavelet derived features to be robust to moderate changes in the illumination direction; however, Adini et al. (1997) showed that the 2D Gabor wavelet derived features are sensitive to gross changes in the illumination direction.

Belhumeur et al. (1997) proposed robust features based on Fisher’s Linear Discriminant; however, to achieve robustness, the system required face images with varying illumination for training purposes.

As will be shown, 2D DCT based features are also sensitive to changes in the illumination direction. In this letter we introduce four new techniques, which are significantly less affected by an illumination direction change: DCT-delta, DCT-mod, DCT-mod-delta and DCT-mod2. We will show that the DCT-mod2 method, which utilizes polynomial coefficients derived from 2D DCT coefficients of spatially neighbouring blocks, is the most suitable. We then compare the robustness and performance of the DCT-mod2 method against three popular feature extraction techniques: eigenfaces (PCA), PCA with histogram equalization and 2D Gabor wavelets.

The rest of the letter is organized as follows. In Section 2 we briefly review the 2D DCT feature extraction technique and describe the proposed feature extraction methods which build from the 2D DCT. In Section 3 we describe a Gaussian Mixture Model (GMM) based classifier which shall be used as the basis for experiments. The performance of the traditional and proposed feature extraction techniques is compared in Section 4, using an artificial illumination direction change. Section 5 is devoted to experiments on the Weizmann database (Adini et al., 1997) which has more realistic illumination direction changes.

To keep consistency with traditional matrix notation, pixel locations (and image sizes) are described using the row(s) first, followed by the column(s).

Section snippets

2D discrete cosine transform (DCT)

Here the given face image is analyzed on a block by block basis. Given an image block f(y,x), where y,x=0,1,…,N_P−1 (here we use N_P=8), we decompose it in terms of orthogonal 2D DCT basis functions (see Fig. 1). The result is an N_P×N_P matrix C(v,u) containing 2D DCT coefficients: $C(v,u)=α(v)α(u)∑_{y=0}^{N_{P}−1} ∑_{x=0}^{N_{P}−1} f(y,x)β(y,x,v,u)$ for v,u=0,1,2,…,N_P−1, where $α(v)= 1 N_{P} for v=0 2 N_{P} for v=1,2,…,N_{P} −1$ and $β(y,x,v,u)= cos (2y+1)vπ 2N_{P} cos (2x+1)uπ 2N_{P}$ The coefficients are ordered according to a zig–zag pattern, reflecting

GMM based classifier

Given a claim for person C’s identity and a set of feature vectors $X={x →_{i}}_{i=1}^{N_{V}}$ supporting the claim, the average log likelihood of the claimant being the true claimant is calculated using: $L (X|λ_{C})= 1 N_{V} ∑_{i=1}^{N_{V}} log p(x →_{i} |λ_{C})$ $where p(x → |λ)=∑_{j=1}^{N_{G}} m_{j} N (x →; μ →_{j}, Σ_{j})$ $λ={m_{j}, μ →_{j}, Σ_{j}}_{j=1}^{N_{G}}$ Here, $N (x →; μ →_{j}, Σ)$ is a D-dimensional Gaussian function with mean $μ →$ and diagonal covariance matrix $Σ$ : $N (x →; μ →_{j}, Σ)= 1 (2π)^{D/2} | Σ |^{1/2} exp −1 2 (x → − μ →)^{T} Σ^{−1} (x → − μ →)$ λ_C is the parameter set for person C, N_G is the number of Gaussians and m_j is

VidTIMIT audio-visual database

The VidTIMIT database (Sanderson, 2002), is comprised of video and corresponding audio recordings of 43 people (19 female and 24 male), reciting short sentences. It was recorded in 3 sessions, with a mean delay of 7 days between Session 1 and 2, and 6 days between Sessions 2 and 3. There are 10 sentences per person; the first six sentences are assigned to Session 1; the next two sentences are assigned to Session 2 with the remaining two to Session 3. The mean duration of each sentence is 4.25

Experiments on the Weizmann database

The experiments described in Section 4 utilized an artificial illumination direction change. In this section we shall compare the performance of 2D DCT, 2D Gabor and DCT-mod2 feature sets on the Weizmann database (Adini et al., 1997), which has more realistic illumination direction changes.

It must be noted that the database is rather small, as it is comprised of images of 27 people; moreover, for the direct frontal view, there is only one image per person with uniform illumination (the training

Conclusion

In this letter we proposed four new facial feature extraction techniques, which are resistant the effects of illumination direction changes; out of the proposed methods, the DCT-mod2 technique, which utilizes polynomial coefficients derived from 2D DCT coefficients of spatially neighbouring blocks, is the most suitable. Face verification results on the VidTIMIT database suggest that the DCT-mod2 feature set is superior (in terms of robustness to illumination direction changes and discrimination

References (32)

R.M. Bolle et al.
Biometric perils and patches
Pattern Recognit.
(2002)
L.-F. Chen et al.
Why recognition in a statistics-based face recognition system should be based on the pure face portion: a probabilistic decision-based proof
Pattern Recognit.
(2001)
S. Eickeler et al.
Recognition of JPEG compressed face images based on statistical methods
Image Vision Comput.
(2000)
S. Furui
Recent advances in speaker recognition
Pattern Recognition Lett.
(1997)
M.A. Grudin
On internal representations in face recognition systems
Pattern Recognit.
(2000)
C. Kotropoulos et al.
Morphological elastic graph matching applied to frontal face authentication under well-controlled and real conditions
Pattern Recognit.
(2000)
D.A. Reynolds
Speaker identification and verification using Gaussian Mixture Speaker Models
Speech Commun.
(1995)
D. Reynolds et al.
Speaker verification using adapted gaussian mixture models
Digital Signal Process.
(2000)
Y. Adini et al.
Face recognition: The problem of compensating for changes in illumination direction
IEEE Trans. Pattern Anal. Machine Intell.
(1997)
P.N. Belhumeur et al.
Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection
IEEE Trans. Pattern Anal. Machine Intell.
(1997)

K.R. Castleman

Digital Image Processing

(1996)

R. Chellappa et al.

Human and machine recognition of faces: a survey

Proc. IEEE

(1995)

A.P. Dempster et al.

Maximum likelihood from incomplete data via the EM algorithm

J. Roy. Statist. Soc. Ser. B

(1977)

B. Duc et al.

Face authentication with Gabor information on deformable graphs

IEEE Trans. Image Process.

(1999)

R.O. Duda et al.

Pattern Classification

(2001)

Dugelay, J.-L., Junqua, J.-C., Kotropoulos, C., Kuhn, R., Perronnin, F., Pitas, I., 2002. Recent advances in biometric...

Cited by (114)

Non-stationary feature fusion of face and palmprint multimodal biometrics
2016, Neurocomputing
Citation Excerpt :
In these methods, local features can be extracted from the image on a block-by-block basis where the coefficients in the low frequency band are reserved for representation. A successful method based on spatial frequency was developed based on statistical model such as a Gaussian Mixture Model (GMM) [41,42] and Hidden Markov Model (HMM) [15,41]. A statistical model can learn the distribution of coefficients in all sub block images whereby each image from the same class will have its own density distribution.
Multimodal biometrics provides high performance in biometric recognition systems with respect to unibiometric systems as they offer a more universal approach, added security and better recognition accuracy. Moreover, data acquisition at the feature level brings out rich information from the traits, thus fusion of modalities at this level is desirable. In this paper we propose a novel fusion technique called non-stationary feature fusion where a new structure of interleaved matrix is constructed using local features extracted from two modalities i.e. face and palmprint images. A block based Discrete Cosine Transform (DCT) algorithm is used to construct a fused feature vector by extracting independent feature vectors from each spatial image. This fused feature vector contains nonlinear information that is used to train a Gaussian Mixture Models (GMM) based statistical model. The model provides accurate estimation of the class conditional probability density function of the fused feature vector. The proposed method produces recognition rates as high as 99.7% and 97% when tested on benchmark databases-ORL-PolyU and FERET-PolyU respectively. These rates are achieved using 23% low frequency DCT coefficients. The new technique is shown to outperform existing feature level fusion methods including methods based on matching and decision level fusion.
Data-Driven Advancements in Lip Motion Analysis: A Review
2023, Electronics (Switzerland)
Automated Dataset Collection Pipeline for Lip Motion Authentication
2023, IS and T International Symposium on Electronic Imaging Science and Technology
FERCE: Facial Expression Recognition for Combined Emotions Using FERCE Algorithm
2022, IETE Journal of Research
An efficient human face verification system based on ELBP: a high precision feature
2021, Journal of Ambient Intelligence and Humanized Computing
Audio-visual biometric recognition and presentation attack detection: A comprehensive survey
2021, arXiv

View all citing articles on Scopus

View full text

Fast features for face authentication under illumination direction changes

Abstract

Introduction

Section snippets

2D discrete cosine transform (DCT)

GMM based classifier

VidTIMIT audio-visual database

Experiments on the Weizmann database

Conclusion

Pattern Recognit.

Pattern Recognit.

Image Vision Comput.

Pattern Recognition Lett.

Pattern Recognit.

Pattern Recognit.

Speech Commun.

Digital Signal Process.

Face recognition: The problem of compensating for changes in illumination direction

IEEE Trans. Pattern Anal. Machine Intell.

Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection

IEEE Trans. Pattern Anal. Machine Intell.

Digital Image Processing

Human and machine recognition of faces: a survey

Proc. IEEE

Maximum likelihood from incomplete data via the EM algorithm

J. Roy. Statist. Soc. Ser. B

Face authentication with Gabor information on deformable graphs

IEEE Trans. Image Process.

Pattern Classification