Elsevier

Real-Time Imaging

Volume 9, Issue 4, August 2003, Pages 277-287
Real-Time Imaging

A novel method for detecting lips, eyes and faces in real time

https://doi.org/10.1016/j.rti.2003.08.003Get rights and content

Abstract

This paper presents a real-time face detection algorithm for locating faces in images and videos. This algorithm finds not only the face regions, but also the precise locations of the facial components such as eyes and lips. The algorithm starts from the extraction of skin pixels based upon rules derived from a simple quadratic polynomial model. Interestingly, with a minor modification, this polynomial model is also applicable to the extraction of lips. The benefits of applying these two similar polynomial models are twofold. First, much computation time are saved. Second, both extraction processes can be performed simultaneously in one scan of the image or video frame. The eye components are then extracted after the extraction of skin pixels and lips. Afterwards, the algorithm removes the falsely extracted components by verifying with rules derived from the spatial and geometrical relationships of facial components. Finally, the precise face regions are determined accordingly. According to the experimental results, the proposed algorithm exhibits satisfactory performance in terms of both accuracy and speed for detecting faces with wide variations in size, scale, orientation, color, and expressions.

Introduction

In recent years, the fast advancement of the image processing techniques and the cost down of various image/video acquisition devices encouraged the development of many computer vision applications, such as vision-based surveillance, vision-based man–machine interfaces, vision-based biometrics, and so on. Among these many applications, face recognition is one of the central tasks that attract the attention of more and more researchers. A number of works in the literature had presented some face recognition applications in laboratorical and commercial scales [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11]. One of the important tasks in designing a good face recognition system is the design of an efficient algorithm to locate faces in captured images or video. Actually, face detection is also a central task in some applications other than recognition systems. For example, in some video transmission applications, human faces are the only changing foreground objects in the video frames. Therefore, the repeated encoding, transmission and decoding of unchanged background parts are avoided to save the network bandwidth and computations. Hence, face detection plays a key role in segmenting the faces from the video background. As the face detection is always the first step in the processes of these recognition or transmission systems, its performance would put a strict limit on the achieved performance of the whole system. Ideally, a good face detector should accurately extract all faces in images regardless of their positions, scales, orientations, colors, shapes, poses, expressions and light conditions. However, for the current state of the art in image processing technologies, this goal is a big challenge. For this reason, many designed face detectors deal with only upright and frontal faces in well-constrained environments [1], [12], [13], [14], [15], [16].

In addition to the accuracy, another important concern is the detecting speed. For instances, in many video phones and surveillance applications, the real-time speed is a critical requirement. This real-time speed requirement prohibited many algorithms that precisely extract faces at the cost of an extensive amount of computation time. Some high-speed computer CPUs may provide a good hardware solution to the speed requirement; however, the high costs of these powerful CPUs may also cut down the acceptability of these systems for common users.

In this paper, we propose a novel real-time face detection algorithm that can accurately locate both the face regions in images and the eyes and lips for each located face. The detailed capability specifications of the proposed algorithm are described as follows:

  • 1.

    Users can tilt their faces left or right for about 45°.

  • 2.

    Users can raise, lower, or rotate their heads as long as neither lips nor eyes are occluded.

  • 3.

    The sizes of faces are limited to the size between 1600 (=40×40) pixels and 9216 (=96×96) pixels. This limitation is set to fit the resolution requirements for general face recognition engines. The values can be easily adjusted if different resolutions are demanded.

We assume that the environment light uniformly illuminates on the faces. That is, we exclude the cases that light is focused on partial areas of face images. The basic concept of the proposed algorithm is to extract and then verify the desired components, including skins, lips, eyes and faces with several simple rules. It is found that the defined rules can handle a large degree of variations in faces. Due to the simplicity and effectiveness of these rules, the proposed algorithm can accurately detect faces with wide variations at a real-time speed.

The rest of the paper is organized as follows. Section 2 makes a brief survey on some related work. The details of the proposed algorithm are presented in Section 3. In order to show the effectiveness of the proposed algorithm, some experimental results are provided in Section 4. The performance evaluation and comparisons in terms of the accuracy and speed is also given there. Finally, we conclude this paper in Section 5.

Section snippets

Related works

A straightforward approach for detecting faces in images is through template correlation matching [1], [2], [3], [4]. The template can be designed or learned through the collection of a set of face patterns. During the matching process, a template is convolved with the subimages everywhere within the input image to find the possible candidates based on a predefined similarity or distance. To handle the possible variations in size, orientation, and shape, etc., two methods are usually adopted.

Rule-based face detection algorithm

According to the framework of the bottom-up detection approach, the proposed algorithm is designed to extract the facial components including lips and eyes. In order to reducing the searching areas in the input images, the proposed algorithm also performs the extraction of skin pixels. However, instead of using probabilistic models, we use a quadratic polynomial model for the color model of skin pixels to reduce the computation time. Moreover, we also extend this polynomial model to the

Performance evaluation

For the performance evaluation, we have implemented the proposed algorithm on a PC with a Pentium III 800 CPU and 128M RAM. The implemented system has two modes of operations. The first is on-line mode that is designed to detect faces in video frames captured from a PC camera in real time. The other mode is off-line mode that is designed to detect faces in still images. To evaluate the accuracy and the speed of the proposed algorithm, we have prepared a test set that contains 1000 images. Among

Concluding remarks

According to the experimental results, the proposed algorithm exhibits satisfactory performances in both accuracy and speed. Actually, for those applications with well-constrained conditions in system usage and environment control, the proposed algorithm can be further improved in both speed and accuracy by further simplifications and refinement of our system design. However, there still are two main restrictions in using the proposed algorithm:

  • 1.

    The light condition must be normal. In other

Acknowledgements

This work was supported in part by the National Science Council of Republic of China under Grant NSC-90-2218-E-259-001.

References (37)

  • P.N. Belhumeur et al.

    Eigenfaces vs. fisherfacesrecognition using class specific linear projection

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1997)
  • M.J. Er et al.

    Face recognition with radial basis function (rbf) neural networks

    IEEE Transactions on Neural Networks

    (2002)
  • Y.S. Gao et al.

    Face recognition using line edge map

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2002)
  • D. Rong et al.

    Face recognition algorithm using local and global information

    Electronics Letters

    (2002)
  • K.I. Kim et al.

    Face recognition using kernel principal component analysis

    IEEE Signal Processing Letters

    (2002)
  • K.I. Kim et al.

    Face recognition using support vector machines with local correlation kernels

    International Journal of Pattern Recognition and Artificial Intelligence

    (2002)
  • R.W. Frischholz et al.

    Bioida multimodal biometric identification system

    IEEE Computer

    (2000)
  • R. Chellappa et al.

    Human and machine recognition of faces, a survey

    Proceedings of the IEEE

    (1995)
  • Cited by (90)

    • Explore double-opponency and skin color for saliency detection

      2021, Neurocomputing
      Citation Excerpt :

      The saliency value is not high in the image saliency detection, especially in the saliency target of a human face, and consequently the current method often fails. Skin color detection is often used in face detection [14], gesture recognition [71], and sensitive image filtering [59]. To address this problem, we propose the double opponency-skin color (DS) saliency model.

    • Automatic generation of facial expression using triangular geometric deformation

      2014, Journal of Applied Research and Technology
      Citation Excerpt :

      This study used the eight-adjacent method to process the pixels one by one with the center of the 3×3 mask. The two major steps of the processing process [26] [28] are Label-Assigning and Label-Merging. As the chin and the neck are of skin color in skin color segmentation, they can be easily regarded as continuous skin color blocks.

    • Hybrid computer vision system for drivers' eye recognition and fatigue monitoring

      2014, Neurocomputing
      Citation Excerpt :

      The resulting average recall parameter is 95% and precision 98%. This compares favorably with the results reported e.g. in [61,7]. Examination of the misclassified cases reveals that problems are usually due to wrong initial skin segmentation.

    View all citing articles on Scopus
    View full text