Elsevier

Pattern Recognition Letters

Volume 24, Issue 14, October 2003, Pages 2409-2419
Pattern Recognition Letters

Fast features for face authentication under illumination direction changes

https://doi.org/10.1016/S0167-8655(03)00070-9Get rights and content

Abstract

In this letter we propose a facial feature extraction technique which utilizes polynomial coefficients derived from 2D Discrete Cosine Transform (DCT) coefficients obtained from horizontally and vertically neighbouring blocks. Face authentication results on the VidTIMIT database suggest that the proposed feature set is superior (in terms of robustness to illumination changes and discrimination ability) to features extracted using four popular methods: Principal Component Analysis (PCA), PCA with histogram equalization pre-processing, 2D DCT and 2D Gabor wavelets; the results also suggest that histogram equalization pre-processing increases the error rate and offers no help against illumination changes. Moreover, the proposed feature set is over 80 times faster to compute than features based on Gabor wavelets. Further experiments on the Weizmann database also show that the proposed approach is more robust than 2D Gabor wavelets and 2D DCT coefficients.

Introduction

The field of face recognition can be divided into two areas: face identification and face verification (also known as authentication). A face verification system verifies the claimed identity based on images (or a video sequence) of the claimant’s face; this is in contrast to an identification system, which attempts to find the identity of a given person out of a pool of N people.

Verification systems pervade our every day life; for example, Automatic Teller Machines (ATMs) employ simple identity verification where the user is asked to enter their password (known only to the user), after inserting their ATM card; if the password matches the one prescribed to the card, the user is allowed access to their bank account. However, the verification system such as the one used in the ATM only verifies the validity of the combination of a certain possession (in this case, the ATM card) and certain knowledge (the password). The ATM card can be lost or stolen, and the password can be compromised (e.g. somebody looks over your shoulder while you’re keying it in). In order to address this issue, biometric verification methods have emerged where the password can be either replaced by, or used in addition to, biometrics such as the person’s speech, face image or fingerprints. More information about the field of biometrics can be found in papers by Bolle et al. (2002), Dugelay et al. (2002) and Woodward (1997).

Generally speaking, a full face recognition system can be thought of as being comprised of three stages:

  • 1.

    Face localization and segmentation

  • 2.

    Normalization

  • 3.

    The actual face identification/verification, which can be further subdivided into:

    • (a)

      Feature extraction

    • (b)

      Classification


The second stage (normalization) usually involves an affine transformation (Gonzales and Woods, 1993) (to correct for size and rotation), but it can also involve an illumination normalization (however, illumination normalization may not be necessary if the feature extraction method is robust against varying illumination). In this letter we shall concentrate on the feature extraction part of the last stage.

There are many approaches to face based systems, ranging from the ubiquitous Principal Component Analysis (PCA) approach (also known as eigenfaces) (Turk and Pentland, 1991), Dynamic Link Architecture (also known as elastic graph matching) (Duc et al., 1999), Artificial Neural Networks (Lawrence et al., 1997), to pseudo-2D Hidden Markov Models (HMM) (Samaria, 1994; Eickeler et al., 2000). Recent surveys on face recognition can be found in papers by Chellappa et al. (1995), Zhang et al. (1997) and Grudin (2000).

The above-mentioned systems differ in terms of the feature extraction procedure and/or the classification technique used. For example, Turk and Pentland (1991) used PCA for feature extraction and a nearest neighbour classifier for recognition. Duc et al. (1999) used biologically inspired 2D Gabor wavelets (Lee, 1996) for feature extraction, while employing the Dynamic Link Architecture as part of the classifier. Eickeler et al. (2000) obtained features using the 2D Discrete Cosine Transform (DCT) and used the pseudo-2D HMM as the classifier.

PCA derived features have been shown to be sensitive to changes in the illumination direction (Belhumeur et al., 1997) causing rapid degradation in verification performance. A study by Zhang et al. (1997) has shown a system employing 2D Gabor wavelet derived features to be robust to moderate changes in the illumination direction; however, Adini et al. (1997) showed that the 2D Gabor wavelet derived features are sensitive to gross changes in the illumination direction.

Belhumeur et al. (1997) proposed robust features based on Fisher’s Linear Discriminant; however, to achieve robustness, the system required face images with varying illumination for training purposes.

As will be shown, 2D DCT based features are also sensitive to changes in the illumination direction. In this letter we introduce four new techniques, which are significantly less affected by an illumination direction change: DCT-delta, DCT-mod, DCT-mod-delta and DCT-mod2. We will show that the DCT-mod2 method, which utilizes polynomial coefficients derived from 2D DCT coefficients of spatially neighbouring blocks, is the most suitable. We then compare the robustness and performance of the DCT-mod2 method against three popular feature extraction techniques: eigenfaces (PCA), PCA with histogram equalization and 2D Gabor wavelets.

The rest of the letter is organized as follows. In Section 2 we briefly review the 2D DCT feature extraction technique and describe the proposed feature extraction methods which build from the 2D DCT. In Section 3 we describe a Gaussian Mixture Model (GMM) based classifier which shall be used as the basis for experiments. The performance of the traditional and proposed feature extraction techniques is compared in Section 4, using an artificial illumination direction change. Section 5 is devoted to experiments on the Weizmann database (Adini et al., 1997) which has more realistic illumination direction changes.

To keep consistency with traditional matrix notation, pixel locations (and image sizes) are described using the row(s) first, followed by the column(s).

Section snippets

2D discrete cosine transform (DCT)

Here the given face image is analyzed on a block by block basis. Given an image block f(y,x), where y,x=0,1,…,NP−1 (here we use NP=8), we decompose it in terms of orthogonal 2D DCT basis functions (see Fig. 1). The result is an NP×NP matrix C(v,u) containing 2D DCT coefficients:C(v,u)=α(v)α(u)∑y=0NP−1x=0NP−1f(y,x)β(y,x,v,u)for v,u=0,1,2,…,NP−1, whereα(v)=1NPforv=02NPforv=1,2,…,NP−1andβ(y,x,v,u)=cos(2y+1)vπ2NPcos(2x+1)uπ2NPThe coefficients are ordered according to a zig–zag pattern, reflecting

GMM based classifier

Given a claim for person C’s identity and a set of feature vectors X={xi}i=1NV supporting the claim, the average log likelihood of the claimant being the true claimant is calculated using:L(X|λC)=1NVi=1NVlogp(xiC)wherep(x|λ)=∑j=1NGmjN(x;μj,Σj)λ={mj,μj,Σj}j=1NGHere, N(x;μj,Σ) is a D-dimensional Gaussian function with mean μ and diagonal covariance matrix Σ:N(x;μj,Σ)=1(2π)D/2|Σ|1/2exp−12(xμ)TΣ−1(xμ)λC is the parameter set for person C, NG is the number of Gaussians and mj is

VidTIMIT audio-visual database

The VidTIMIT database (Sanderson, 2002), is comprised of video and corresponding audio recordings of 43 people (19 female and 24 male), reciting short sentences. It was recorded in 3 sessions, with a mean delay of 7 days between Session 1 and 2, and 6 days between Sessions 2 and 3. There are 10 sentences per person; the first six sentences are assigned to Session 1; the next two sentences are assigned to Session 2 with the remaining two to Session 3. The mean duration of each sentence is 4.25

Experiments on the Weizmann database

The experiments described in Section 4 utilized an artificial illumination direction change. In this section we shall compare the performance of 2D DCT, 2D Gabor and DCT-mod2 feature sets on the Weizmann database (Adini et al., 1997), which has more realistic illumination direction changes.

It must be noted that the database is rather small, as it is comprised of images of 27 people; moreover, for the direct frontal view, there is only one image per person with uniform illumination (the training

Conclusion

In this letter we proposed four new facial feature extraction techniques, which are resistant the effects of illumination direction changes; out of the proposed methods, the DCT-mod2 technique, which utilizes polynomial coefficients derived from 2D DCT coefficients of spatially neighbouring blocks, is the most suitable. Face verification results on the VidTIMIT database suggest that the DCT-mod2 feature set is superior (in terms of robustness to illumination direction changes and discrimination

References (32)

  • K.R. Castleman

    Digital Image Processing

    (1996)
  • R. Chellappa et al.

    Human and machine recognition of faces: a survey

    Proc. IEEE

    (1995)
  • A.P. Dempster et al.

    Maximum likelihood from incomplete data via the EM algorithm

    J. Roy. Statist. Soc. Ser. B

    (1977)
  • B. Duc et al.

    Face authentication with Gabor information on deformable graphs

    IEEE Trans. Image Process.

    (1999)
  • R.O. Duda et al.

    Pattern Classification

    (2001)
  • Dugelay, J.-L., Junqua, J.-C., Kotropoulos, C., Kuhn, R., Perronnin, F., Pitas, I., 2002. Recent advances in biometric...
  • Cited by (114)

    • Non-stationary feature fusion of face and palmprint multimodal biometrics

      2016, Neurocomputing
      Citation Excerpt :

      In these methods, local features can be extracted from the image on a block-by-block basis where the coefficients in the low frequency band are reserved for representation. A successful method based on spatial frequency was developed based on statistical model such as a Gaussian Mixture Model (GMM) [41,42] and Hidden Markov Model (HMM) [15,41]. A statistical model can learn the distribution of coefficients in all sub block images whereby each image from the same class will have its own density distribution.

    • Automated Dataset Collection Pipeline for Lip Motion Authentication

      2023, IS and T International Symposium on Electronic Imaging Science and Technology
    • An efficient human face verification system based on ELBP: a high precision feature

      2021, Journal of Ambient Intelligence and Humanized Computing
    View all citing articles on Scopus
    View full text