Real-time face location on gray-scale static images

doi:10.1016/S0031-3203(99)00130-2

Pattern Recognition

Volume 33, Issue 9, September 2000, Pages 1525-1539

https://doi.org/10.1016/S0031-3203(99)00130-2 Get rights and content

Abstract

This work presents a new approach to automatic face location on gray-scale static images with complex backgrounds. In a first stage our technique approximately detects the image positions where the probability of finding a face is high; during the second stage the location accuracy of the candidate faces is improved and their existence is verified. The experimentation shows that the algorithm performs very well both in terms of detection rate (just one missed detection on 70 images) and of efficiency (about 13 images/s can be processed on Hardware Intel Pentium II 266 MHz).

Introduction

Automatic face location is a very important task which constitutes the first step of a large area of applications: face recognition, face retrieval by similarity, face tracking, surveillance, etc. (e.g. Ref. [1]). In the opinion of many researchers, face location is the most critical step towards the development of practical face-based biometric systems, since its accuracy and efficiency have a direct impact on the system usability. Several factors contribute to making this task very complex, especially in the case of applications requiring to operate in real-time on gray-scale static images. Complex backgrounds, illumination changes, pose and expression changes, head rotation in the 3D space and different distances between the subject and the camera are the main sources of difficulty.

Many face-location approaches have been proposed in the literature, depending on the type of images (gray-scale images, color images or image sequences) and on the constraints considered (simple or complex background, scale and rotation changes, different illuminations, etc.). Giving a brief summary of the conspicuous number of works requires a pre-classification; unfortunately, due to the large amount of different techniques used by researchers this task is not so easy. While we are aware of the unavoidable inaccuracies, we have tried to make a tentative classification:

•
Methods based on template matching with static masks and heuristic algorithms which use images taken at different resolutions (multiresolution approaches) [2], [3].
•
Computational approaches based on deformable templates which characterize the human face [4] or internal features [5], [6], [7], [8]: eyes, nose, mouth. These methods can be conceived as an evolution of the previous class, since the templates can be adapted to the different shapes characterizing the searched objects. The templates are generally defined in terms of geometric primitives like lines, polygons, circles and arcs; a fitness criterion is employed to determine the degree of matching.
•
Face and facial parts detection using dynamic contours or snakes [6], [9], [10], [11]. These techniques involve a constrained global optimization, which usually gives very accurate results but at the same time is computationally expensive.
•
Methods based on elliptical approximation and on face searching via least-squares minimization [12], incremental ellipse fitting [13] and elliptic region growing [14].
•
Approaches based on the Hough transform [7] and the adaptive Hough transform [15].
•
Methods based on the search for a significant group of features [triplets, constellations, etc.] in the context considered: for example, two eyes and a mouth suitably located constitute a significant group in the context of a face [7], [16], [17], [18], [19].
•
Face search on the eigenspace determined via PCA [20] and face location approaches based on the information theory [21], [22].
•
Neural Network approaches [23], [24], [25], [26], [27], [28], [29], [30]. The best results have been obtained by using feed forward networks to classify image portions normalized with respect to scale and illumination. During the training, examples of face objects and non-face objects are presented to the network. The high computational cost, induced by the need to process at different resolutions all the possible positions of a face in the image, is the main drawback of these methods.
•
Face location on color images through segmentation in a color-space: YIQ, YES, HSI, HSV, Farnsworth, etc [27], [31], [32], [33], [34], [35], [36]. Generally, color information greatly simplifies the localization task: a simple spectrographic analysis shows that the face skin pixels are usually clustered in a color space, and then an ad hoc segmentation allows the face to be isolated from the background or, at least, to drastically reduce the amount of information which must be processed during the successive stages.
•
Face detection on image sequences using motion information: optical flow, spatio-temporal gradient, etc. [27], [33], [37].

Since in several applications it is mandatory (or preferable) to deal with static gray-scale images we believe it is important to develop a method which does not exploit additional information like color and motion. For example, most of the surveillance cameras nowadays installed in shops, banks and airports are still gray-scale cameras (due to the lower cost), and the electronic processing of mug-shot or identity card databases could require to detect faces from static gray-scale pictures printed on paper. Unfortunately, if we discard color and motion-based approaches, the most robust methods are generally time consuming and cannot be used in real-time applications.

The aim of this work is to provide a new method which is capable of processing gray-scale static images in real time. The algorithm must operate with structured backgrounds and must tolerate illumination changes, scale variations and small head rotations.

Our approach (Fig. 1(a)) is based on a location technique which starts by approximately detecting the image positions (or candidate positions) where the probability to find a face is high (module AL) and then for each of them improves the location accuracy and verifies the presence of a true face (module FLFV). Actually, most of the applications in the field of biometric systems require detection of just one object in the image (i.e. the foreground object): under this hypothesis, a more efficient implementation of our method is reported in Fig. 1(b), where at each step the module AL passes only the most likely position to FLFV and FLFV continues to require a new position until a valid face is detected or no more candidates are available. It should be noted that, even in this case, the system could be used to detect more faces in an image, assuming that the iterative process is not prematurely interrupted.

Although AL and FLFV have been implemented in a very different manner, both these modules work on the same kind of data: that is the directional image extracted by the starting gray-scale image.

In Section 2 the directional image is defined and some comments about its computation are reported. Section 3 describes the module AL which is based on the search of elliptical blobs in the directional image by means of the generalized Hough transform. In Section 4 we present the dynamic-mask-based technique used for fine location and face verification (module FLFV) and in Section 5 we discuss how to practically combine AL and FLVF in order to implement the functional schema of Fig. 1(b). Section 6 reports the results of our experimentation over a 70 image database; finally, in Section 7, we present our conclusions and discuss future research.

Section snippets

Directional image

Most of the face location approaches perform an initial edge extraction by means of a gradient-like operator; few methods also exploit other additional features like directional information, intensity maxima and minima, etc. Our technique strongly relies on the edge phase-angles contained in a directional image.

A directional image is a matrix defined over a discrete grid, superimposed on the gray-scale image, whose elements are in correspondence with the grid nodes. Each element is a vector

AL - approximate location

The analysis of a certain number of directional images suggested the formulation of a simple method for detecting faces. In particular, we noted that when a face is present in an image the corresponding directional image region is characterized by vectors producing an elliptical blob. For this reason, the module AL is based on the search for ellipses on the directional image. Several techniques could be used for this purpose, for example multiresolution template matching [39] and least-squares

FLFV - fine location and face verification

Different strategies can be adopted in order to improve the location accuracy and to verify whether an elliptical object is really a face or not. Some of the alternatives we explored are reported in the following:

•
Improving the ellipse center location through AHT (Adaptive Hough Transform) [41], [42] which requires the granularity of the hot accumulator cells to be gradually refined.
•
Local optimization of the center $[x_{c}, y_{c}]$ , of the semi-axes a and b and of the ellipse tilt angle ξ through a local

Combining AL and FLFV

Depending on the application requirements, there are several ways of adjusting and combining AL and FLFV modules. Since at this stage our aim is to develop a method capable of efficiently detecting the foreground face, we adopted the functional schema of Fig. 1.b. In particular, the algorithm searches for just one face in the image; it returns the face position $[x_{f}, y_{f}]$ and sizes $a_{f}, b_{f}$ , in case of detection, and null otherwise. A pseudo-code version of the whole face detection method is reported:

Experimentation

Experimental results have been produced on a database of 70 images each of which contains at least one human face (Fig. 11). All the images (384×288 pixels—256 gray levels) were acquired in some offices and laboratories of our department, under different illuminations (sometimes rather critical: backlighting, semidarkness,…) and with the subject at different distances from the camera. In 10 images people wear spectacles. The subjects were required to gaze the camera. Each of the 70 images was

Conclusions

This work proposes a two-stage approach to face location on the gray scale static images with complex backgrounds. Both the modules operate on the elements constituting the directional image, which has been proved to be very effective in providing reliable information even in the presence of critical illumination and semidarkness.

The approximate location module searches for the most likely positions in the image by means of a particular implementation of the generalized Hough transform. Great

References (48)

I. Craw et al.
Automatic extraction of face-features
Pattern Recognition Lett.
(1987)
G. Yang et al.
Human face detection in a complex background
Pattern Recognition
(1994)
C. Huang et al.
Human facial feature extraction for face interpretation and recognition
Pattern Recognition
(1992)
G. Chow et al.
Towards a system for automatic facial feature detection
Pattern Recognition
(1993)
X. Li et al.
Face contour extraction from front-view images
Pattern Recognition
(1995)
D. Valentin et al.
Connectionist models of face processing—a survey
Pattern Recognition
(1994)
G. Burel et al.
Detection and localization of faces on digital images
Pattern Recognition Lett.
(1994)
Y. Dai et al.
Face-texture model based on SGLD and its application in face detection in a color scene
Pattern Recognition
(1996)
C.H. Lee et al.
Automatic face location in a complex background using motion and color information
Pattern Recognition
(1996)
M.J. Donahue et al.
On the use of level curves in image analysis
Image Understanding
(1993)

L.S. Davis

Hierarchical generalized Hough transform and line segment based generalized Hough transforms

Pattern Recognition

(1982)

R. Chellappa, S. Sirohey, C.L. Wilson, C.S. Barnes, Human and machine recognition of faces: a survey, Tech. Report...

I. Craw, D. Tock, A. Bennet, Finding face features, Proceedings of ECCV,...

A. Yuille, D. Cohen, P. Hallinan, Facial features extraction by deformable templates, Tech. Report 88-2, Harward...

K. Lam et al.

Locating and extracting the eye in human face images

Pattern Recognition

(1996)

A. Lanitis, C.J. Taylor, T.F. Cootes, T. Ahmed, Automatic interpretation of human faces and hand gesture using flexible...

R. Funayama, N. Yokoya, H. Iwasa, H. Takemura, Facial component extraction by cooperative active nets with global...

S.R. Gunn, M.S. Nixon, Snake head boundary extraction using global and local energy minimisation, Proceedings of the...

S.A. Sirohey, Human face segmentation and identification, Tech. Report CAR-TR-695, Center for Automation Research,...

A. Jacquin, A. Eleftheriadis, Automatic location tracking of faces and facial features in video sequences, Proceedings...

R. Herpers, H. Kattner, H. Rodax, G. Sommer, GAZE: an attentive processing strategy to detect and analyze the prominent...

V. Govindaraju, S.N. Srihari, D.B. Sher, A computational model for face location, Proceedings of the 3rd ICCV, 1990,...

H.P. Graf, T. Chen, E. Petajan, E. Cosatto, Locating faces and facial parts, Proceedings of the International Workshop...

M.C. Burl, T.K. Leung, P. Perona, Face localization via shape statistics, Proceedings of the International Workshop on...

Cited by (123)

Gaussian model for closed curves
2024, Expert Systems with Applications
In the case of image processing or understanding, one of the common important tasks is to fit closed curves (e.g., circles, ellipses, etc.) to the underlying image. In higher-dimensional situations, the problem of modeling clusters as closed curves remains even more challenging.
To deal with this problem we introduce a new probability distribution, which models complicated closed curves with the use of Fourier series. Then, a mixture of such distributions is constructed, leading to our model MCEC. It is shown that MCEC can be effectively trained in the case of closed curves in $R^{n}$ .
MCEC was evaluated in clustering, curve fitting, and image segmentation tasks. We compare it in particular to classical GMM, Hough transform, and Snakes algorithm. In each task, we consistently obtain or outperform the current SOTA.
Robust ellipse detection with Gaussian mixture models
2016, Pattern Recognition
Citation Excerpt :
Ellipse fitting is a challenging problem that arises in several fields. Some examples of applications are segmentation of cells [11], study of galaxies [12], medical diagnostics [13], camera calibration and face detection among others [14,15]. As many applications as there are of fitting ellipses there are also a great number of algorithms proposing solutions to this problem [16].
The Euclidian distance between Gaussian Mixtures has been shown to be robust to perform point set registration (Jian and Vemuri, 2011). We propose to extend this idea for robustly matching a family of shapes (ellipses). Optimisation is performed with an annealing strategy, and the search for occurrences is repeated several times to detect multiple instances of the shape of interest. We compare experimentally our approach to other state-of-the-art techniques on a benchmark database for ellipses, and demonstrate the good performance of our approach.
A novel rotation adaptive object detection method based on pair Hough model
2016, Neurocomputing
Citation Excerpt :
In order to use Generalized Hough Transform to recognize objects, voting models of detectable parts need to be defined so that votes for object parameters such as position and size can be casted according to them. Most Hough-based face recognition techniques [20–22] use predefined models since frontal human faces share many common features. However, defining models for an unseen object category is difficult, and every object class is not so easy to be modeled as human faces.
This paper proposes a novel Hough-based object shape representation model called Pair Hough Model (PHM) and its corresponding object detection framework. PHM constructs the voting models implicitly with automatically detected interest points and their local descriptors for unseen object categories. In addition, by casting votes according to key point pairs instead of individual key points and taking the orientations of objects as well as their sizes into consideration, PHM can recognize and localize objects after their scaling and/or rotation, which makes it suitable for processing images with major rotations such as pictures taken by mobile devices. Evaluation experiments proved that PHM does not need to be trained on rotated images to recognize rotated objects, and PHM achieved comparable results to the state-of-the-art methods on several widely used public data sets.
Machine learning for multi-view eye-pair detection
2014, Engineering Applications of Artificial Intelligence
Citation Excerpt :
In another eye detection and tracking system (Abdel-Kader et al., 2014), eyes are detected and tracked by a particle swarm optimization based multiple template matching algorithm. In another paper, the Hough transform algorithm is used in combination with directional image filters previously proposed for face detection (Maio and Maltoni, 2000). In Ilbeygi and Shah-Hosseini (2012), luminance and chrominance values of colored image patches are extracted and given to a template matching algorithm to detect eyes.
While face and eye detection is well known research topics in the field of object detection, eye-pair detection has not been much researched. Finding the location and size of an eye-pair in an image containing a face can enable a face recognition application to extract features from a face corresponding to different entities. Furthermore, it allows us to align different faces, so that more accurate recognition results can be obtained. To the best of our knowledge, currently there is only one eye-pair detector, which is a part of the Viola–Jones object detection framework. However, as we will show in this paper, this eye-pair detector is not very accurate for detecting eye-pairs from different face images. Therefore, in this paper we describe several novel eye-pair detection methods based on different feature extraction methods and a support vector machine (SVM) to classify image patches as containing an eye-pair or not. To find the location of an eye-pair on unseen test images, a sliding window approach is used, and the location and size of the window giving the highest output of the SVM classifier are returned. We have tested the different methods on three different datasets: the IMM, the Caltech and the Indian face dataset. The results show that the linear restricted Boltzmann machine feature extraction technique and principal component analysis result in the best performances. The SVM with these feature extraction methods is able to very accurately detect eye-pairs. Furthermore, the results show that our best eye-pair detection methods perform much better than the Viola–Jones eye-pair detector.
Two dimensional synthetic face generation and verification using set estimation technique
2012, Computer Vision and Image Understanding
Citation Excerpt :
A PSNR value is greater than 30 dB, in general, indicates the closeness of the images. Quality of the generated image is taken to be acceptable if the PSNR is found to be greater than 20 dB [24]. 160 new images, taken four images per class from 40 classes have been generated.
In this paper set estimation technique is applied for generation of 2D face images. The synthesis is done on the basis of inheriting features from inter and intra face classes in face space. Face images without artifacts and expressions are transformed to images with artifacts and expressions with the help of the developed methods. Most of the test images are generated using the proposed method. The measured PSNR values for the generated faces with respect to the training faces reflect the well accepted quality of the generated images. The generated faces are also classified properly to their respective face classes using nearest neighbor classifier. Validation of the method is demonstrated on AR and FIA datasets. Classification accuracy is increased when the new generated faces are added to the training set.
Image and Video Processing Tools for HCI
2010, Multimodal Signal Processing
Image and video processing tools for human–computer interaction (HCI) are reviewed in this chapter. Different tools are used in close view applications, such as desktop computer applications or mobile telephone interfaces, and in distant view setups, such as smart-rooms scenarios or augmented-reality games. In the first case, the user can be captured in a close view and some assumptions can be made regarding the location and pose of the user. For instance, in face-oriented applications, a frontal view of the face is generally assumed, whereas for gestural interfaces, the hand is supposed to perform a gesture from a specific dictionary directly in front of the camera. This chapter describes face and hand analysis techniques that can be used in close view interfaces, such as desktop computer applications. Face analysis is used in HCI for recognition of the person and for more advanced interfaces that take into account the user state, analyzing for instance its facial expressions.

View all citing articles on Scopus

About the Author—DARIO MAIO is Full Professor at the Computer Science Department, University of Bologna, Italy. He has published in the fields of distributed computer systems, computer performance evaluation, database design, information systems, neural networks, biometric systems, autonomous agents. Before joining the Computer Science Department, he received a fellowship from the C.N.R. (Italian National Research Council) for participation to the Air Traffic Control Project. He received the degree in Electronic Engineering from the University of Bologna in 1975. He is a IEEE member. He is with CSITE - C.N.R. and with DEIS; he teaches database and information systems at the Computer Science Dept., Cesena.

About the Author—DAVIDE MALTONI is an Associate Researcher at the Computer Science Department, University of Bologna, Italy. He received the degree in Computer Science from the University of Bologna, Italy, in 1993. In 1998 he received his Ph.D. in Computer Science and Electronic Engineering at DEIS, University of Bologna, with research theme “Biometric Systems”. His research interests also include autonomous agents, pattern recognition and neural nets. He is an IAPR member.

View full text

Real-time face location on gray-scale static images

Abstract

Introduction

Section snippets

Directional image

AL - approximate location

FLFV - fine location and face verification

Combining AL and FLFV

Experimentation

Conclusions

Pattern Recognition Lett.

Pattern Recognition

Pattern Recognition

Pattern Recognition

Pattern Recognition

Pattern Recognition

Pattern Recognition Lett.

Pattern Recognition

Pattern Recognition

Image Understanding

Pattern Recognition

Locating and extracting the eye in human face images

Pattern Recognition