A novel method for detecting lips, eyes and faces in real time

doi:10.1016/j.rti.2003.08.003

Real-Time Imaging

Volume 9, Issue 4, August 2003, Pages 277-287

https://doi.org/10.1016/j.rti.2003.08.003 Get rights and content

Abstract

This paper presents a real-time face detection algorithm for locating faces in images and videos. This algorithm finds not only the face regions, but also the precise locations of the facial components such as eyes and lips. The algorithm starts from the extraction of skin pixels based upon rules derived from a simple quadratic polynomial model. Interestingly, with a minor modification, this polynomial model is also applicable to the extraction of lips. The benefits of applying these two similar polynomial models are twofold. First, much computation time are saved. Second, both extraction processes can be performed simultaneously in one scan of the image or video frame. The eye components are then extracted after the extraction of skin pixels and lips. Afterwards, the algorithm removes the falsely extracted components by verifying with rules derived from the spatial and geometrical relationships of facial components. Finally, the precise face regions are determined accordingly. According to the experimental results, the proposed algorithm exhibits satisfactory performance in terms of both accuracy and speed for detecting faces with wide variations in size, scale, orientation, color, and expressions.

Introduction

In recent years, the fast advancement of the image processing techniques and the cost down of various image/video acquisition devices encouraged the development of many computer vision applications, such as vision-based surveillance, vision-based man–machine interfaces, vision-based biometrics, and so on. Among these many applications, face recognition is one of the central tasks that attract the attention of more and more researchers. A number of works in the literature had presented some face recognition applications in laboratorical and commercial scales [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11]. One of the important tasks in designing a good face recognition system is the design of an efficient algorithm to locate faces in captured images or video. Actually, face detection is also a central task in some applications other than recognition systems. For example, in some video transmission applications, human faces are the only changing foreground objects in the video frames. Therefore, the repeated encoding, transmission and decoding of unchanged background parts are avoided to save the network bandwidth and computations. Hence, face detection plays a key role in segmenting the faces from the video background. As the face detection is always the first step in the processes of these recognition or transmission systems, its performance would put a strict limit on the achieved performance of the whole system. Ideally, a good face detector should accurately extract all faces in images regardless of their positions, scales, orientations, colors, shapes, poses, expressions and light conditions. However, for the current state of the art in image processing technologies, this goal is a big challenge. For this reason, many designed face detectors deal with only upright and frontal faces in well-constrained environments [1], [12], [13], [14], [15], [16].

In addition to the accuracy, another important concern is the detecting speed. For instances, in many video phones and surveillance applications, the real-time speed is a critical requirement. This real-time speed requirement prohibited many algorithms that precisely extract faces at the cost of an extensive amount of computation time. Some high-speed computer CPUs may provide a good hardware solution to the speed requirement; however, the high costs of these powerful CPUs may also cut down the acceptability of these systems for common users.

In this paper, we propose a novel real-time face detection algorithm that can accurately locate both the face regions in images and the eyes and lips for each located face. The detailed capability specifications of the proposed algorithm are described as follows:

1.
Users can tilt their faces left or right for about 45°.
2.
Users can raise, lower, or rotate their heads as long as neither lips nor eyes are occluded.
3.
The sizes of faces are limited to the size between 1600 (=40×40) pixels and 9216 (=96×96) pixels. This limitation is set to fit the resolution requirements for general face recognition engines. The values can be easily adjusted if different resolutions are demanded.

We assume that the environment light uniformly illuminates on the faces. That is, we exclude the cases that light is focused on partial areas of face images. The basic concept of the proposed algorithm is to extract and then verify the desired components, including skins, lips, eyes and faces with several simple rules. It is found that the defined rules can handle a large degree of variations in faces. Due to the simplicity and effectiveness of these rules, the proposed algorithm can accurately detect faces with wide variations at a real-time speed.

The rest of the paper is organized as follows. Section 2 makes a brief survey on some related work. The details of the proposed algorithm are presented in Section 3. In order to show the effectiveness of the proposed algorithm, some experimental results are provided in Section 4. The performance evaluation and comparisons in terms of the accuracy and speed is also given there. Finally, we conclude this paper in Section 5.

Section snippets

Related works

A straightforward approach for detecting faces in images is through template correlation matching [1], [2], [3], [4]. The template can be designed or learned through the collection of a set of face patterns. During the matching process, a template is convolved with the subimages everywhere within the input image to find the possible candidates based on a predefined similarity or distance. To handle the possible variations in size, orientation, and shape, etc., two methods are usually adopted.

Rule-based face detection algorithm

According to the framework of the bottom-up detection approach, the proposed algorithm is designed to extract the facial components including lips and eyes. In order to reducing the searching areas in the input images, the proposed algorithm also performs the extraction of skin pixels. However, instead of using probabilistic models, we use a quadratic polynomial model for the color model of skin pixels to reduce the computation time. Moreover, we also extend this polynomial model to the

Performance evaluation

For the performance evaluation, we have implemented the proposed algorithm on a PC with a Pentium III 800 CPU and 128M RAM. The implemented system has two modes of operations. The first is on-line mode that is designed to detect faces in video frames captured from a PC camera in real time. The other mode is off-line mode that is designed to detect faces in still images. To evaluate the accuracy and the speed of the proposed algorithm, we have prepared a test set that contains 1000 images. Among

Concluding remarks

According to the experimental results, the proposed algorithm exhibits satisfactory performances in both accuracy and speed. Actually, for those applications with well-constrained conditions in system usage and environment control, the proposed algorithm can be further improved in both speed and accuracy by further simplifications and refinement of our system design. However, there still are two main restrictions in using the proposed algorithm:

1.
The light condition must be normal. In other

Acknowledgements

This work was supported in part by the National Science Council of Republic of China under Grant NSC-90-2218-E-259-001.

References (37)

H.C. Kim et al.
Face recognition using the mixture-of-eigenfaces method
Pattern Recognition Letters
(2002)
J.X. Wu et al.
Face recognition with one training image per person
Pattern Recognition Letters
(2002)
M. Soriano et al.
Adaptive skin color modeling using the skin locus for selecting training pixels
Pattern Recognition
(2003)
H. Greenspan et al.
Mixture model for face-color modeling and segmentation
Pattern Recognition Letters
(2001)
K.M. Cho et al.
Adaptive skin-color filter
Pattern Recognition
(2001)
H.X. Yao et al.
Face detection and location based on skin chrominance and lip chrominance transformation from color images
Pattern Recognition
(2001)
Y.J. Wang et al.
A novel approach for human face detection from color images under complex background
Pattern Recognition
(2001)
J. Cai et al.
Detecting human faces in color images
Image and Vision Computing
(1999)
S.J. McKenna et al.
Modelling facial colour and identity with gaussian mixtures
Pattern Recognition
(1998)
Pentland A, Moghaddam B, Stamer T, Oliyide O, Turk M. View-based and modular eigenspaces for face recognition. In: IEEE...

P.N. Belhumeur et al.

Eigenfaces vs. fisherfacesrecognition using class specific linear projection

IEEE Transactions on Pattern Analysis and Machine Intelligence

(1997)

M.J. Er et al.

Face recognition with radial basis function (rbf) neural networks

IEEE Transactions on Neural Networks

(2002)

Y.S. Gao et al.

Face recognition using line edge map

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2002)

D. Rong et al.

Face recognition algorithm using local and global information

Electronics Letters

(2002)

K.I. Kim et al.

Face recognition using kernel principal component analysis

IEEE Signal Processing Letters

(2002)

K.I. Kim et al.

Face recognition using support vector machines with local correlation kernels

International Journal of Pattern Recognition and Artificial Intelligence

(2002)

R.W. Frischholz et al.

Bioida multimodal biometric identification system

IEEE Computer

(2000)

R. Chellappa et al.

Human and machine recognition of faces, a survey

Proceedings of the IEEE

(1995)

Cited by (90)

Explore double-opponency and skin color for saliency detection
2021, Neurocomputing
Citation Excerpt :
The saliency value is not high in the image saliency detection, especially in the saliency target of a human face, and consequently the current method often fails. Skin color detection is often used in face detection [14], gesture recognition [71], and sensitive image filtering [59]. To address this problem, we propose the double opponency-skin color (DS) saliency model.
Recent advances in salient detection have exploited the foreground or background information to assist other saliency cues such as contrast to achieve state-of-the-art results. However the problem remains challenging. For example, human skin color is easily overlooked during saliency detection. Simulating the human visual mechanism to improve the current algorithm, we propose a saliency model based on the color-opponent mechanisms of a certain type of color-sensitive double-opponent (DO) cells in the primary visual cortex (V1) of human visual system. Firstly, DO cells with concentric receptive fields (RFs) can detect region contrast for yielding foreground saliency map. Our approach is intuitive and easy to interpret, and it allows fast implementation. Then, skin saliency map is built in a way that is different from high-level factors such as face detection, combining skin region and spatial Euclidean distance weight in the RGB space, and the significant skin features can be obtained effectively. Finally, a linear fusion strategy is proposed to integrate different saliency maps. Experimental results with three well-known benchmark databases demonstrate that the proposed method can achieve competitive performance when compared to state-of-the-art methods. Saliency regions contain important skin features compared to other traditional methods.
Accurate eye localization in the Short Waved Infrared Spectrum through summation range filters
2015, Computer Vision and Image Understanding
The majority of facial recognition systems depend on the correct location of both the left and right eye centers in an effort to geometrically normalize face images. We propose a novel eye detection algorithm that efficiently locates the eye centers in five different bands of the SWIR spectrum, ranging from 1150 nm up to 1550 nm in increments of 100 nm. Our eye detection methodology utilizes a combination of algorithmic steps, including 2D normalized correlation coefficients as well as summation range filters to effectively find the eyes in the aforementioned SWIR wavelengths. We validate our method by comparing our approach with currently available eye detection algorithms including a commercial face recognition software in which one of its capabilities is the extraction of the eye locations and a state of the art academic approach. Eye detection results as well as face recognition studies show that our proposed approach outperforms all other approaches, including the state of the art (originally designed to work in the visible band), when operating in the SWIR spectrum. We also show that our approach is robust to typical image degradation factors including spatial resolution changes, image compression, and image blurring. This is an important achievement that has also practical value for biometric operators. It is impractical to manually annotate thousands to millions of eye centers, therefore, a quick and robust method for automatically determining the eye center locations is needed.
Automatic generation of facial expression using triangular geometric deformation
2014, Journal of Applied Research and Technology
Citation Excerpt :
This study used the eight-adjacent method to process the pixels one by one with the center of the 3×3 mask. The two major steps of the processing process [26] [28] are Label-Assigning and Label-Merging. As the chin and the neck are of skin color in skin color segmentation, they can be easily regarded as continuous skin color blocks.
This paper presents an image deformation algorithm and constructs an automatic facial expression generation system to generate new facial expressions in neutral state. After the users input the face image in a neutral state into the system, the system separates the possible facial areas and the image background by skin color segmentation. It then uses the morphological operation to remove noise and to capture the organs of facial expression, such as the eyes, mouth, eyebrow, and nose. The feature control points are labeled according to the feature points (FPs) defined by MPEG-4. After the designation of the deformation expression, the system also increases the image correction points based on the obtained FP coordinates. The FPs are utilized as image deformation units by triangular segmentation. The triangle is split into two vectors. The triangle points are regarded as linear combinations of two vectors, and the coefficients of the linear combinations correspond to the triangular vectors of the original image. Next, the corresponding coordinates are obtained to complete the image correction by image interpolation technology to generate the new expression. As for the proposed deformation algorithm, 10 additional correction points are generated in the positions corresponding to the FPs obtained according to MPEG-4. Obtaining the correction points within a very short operation time is easy. Using a particular triangulation for deformation can extend the material area without narrowing the unwanted material area, thus saving the filling material operation in some areas.
Hybrid computer vision system for drivers' eye recognition and fatigue monitoring
2014, Neurocomputing
Citation Excerpt :
The resulting average recall parameter is 95% and precision 98%. This compares favorably with the results reported e.g. in [61,7]. Examination of the misclassified cases reveals that problems are usually due to wrong initial skin segmentation.
This paper presents a hybrid visual system for monitoring driver's states of fatigue, sleepiness and inattention based on driver's eye recognition. Safe operation in car conditions and processing in daily and night conditions are obtained thanks to the custom setup of two cameras operating in the visible and near infra-red spectra, respectively. In each of these spectra image processing is performed by a cascade of two classifiers. The first classifier in a cascade is responsible for detection of eye regions based on the proposed eye models specific to each spectrum. The second classifier in each cascade is responsible for eye verification. It is based on the higher order singular value decomposition of the tensors of geometrically deformed versions of real eye prototypes, specific to the visible and NIR spectra. Experiments were performed in real car conditions in which four volunteer drivers participated. The obtained results show high recognition accuracy and real-time processing in software implementation. Thanks to these the system can become a part of the advanced driver’s assisting system.
Hinge loss bound approach for surrogate supervision multi-view learning
2014, Pattern Recognition Letters
In multi-view learning, a classifier for different partitions (views) of the feature vector is commonly sought after. We consider the special case of surrogate supervision multi-view learning in which a classifier for one view is sought after, however, no labeled examples are available for that view. Instead, the training set consists of only labeled examples for the other view as well as unlabeled two-view data. While it is straightforward to train and test a classifier in the labeled view, it is challenging to perform the same task in the view where labels are unavailable. To solve this problem, we introduce an upper bound on the classical hinge loss (commonly used in support vector machines) that is well suited for the surrogate supervision multi-view learning setup. The bound only requires labeled examples from the other view and unlabeled examples of the two views. Using this approach, we introduce the surrogate supervision multi-class support vector machine (SSM–SVM). We evaluate the algorithm and compare it to other algorithms on a collection of datasets. We present an application of the algorithm to lip reading using audiovisual dataset.
Fuzzy-controlled Image Morse Code Input System
2022, Sensors and Materials

View all citing articles on Scopus

View full text

A novel method for detecting lips, eyes and faces in real time

Abstract

Introduction

Section snippets

Related works

Rule-based face detection algorithm

Performance evaluation

Concluding remarks

Acknowledgements

Pattern Recognition Letters

Pattern Recognition Letters

Pattern Recognition

Pattern Recognition Letters

Pattern Recognition

Pattern Recognition

Pattern Recognition

Image and Vision Computing

Pattern Recognition

Eigenfaces vs. fisherfacesrecognition using class specific linear projection

IEEE Transactions on Pattern Analysis and Machine Intelligence

Face recognition with radial basis function (rbf) neural networks

IEEE Transactions on Neural Networks

Face recognition using line edge map

IEEE Transactions on Pattern Analysis and Machine Intelligence

Face recognition algorithm using local and global information

Electronics Letters

Face recognition using kernel principal component analysis

IEEE Signal Processing Letters

Face recognition using support vector machines with local correlation kernels

International Journal of Pattern Recognition and Artificial Intelligence

Bioida multimodal biometric identification system

IEEE Computer

Human and machine recognition of faces, a survey

Proceedings of the IEEE