Face Detection and Gesture Recognition for Human-Computer Interaction | springerprofessional.de

Springer Professional

Top

2001 | Book

Read chapter Read first chapter

Face Detection and Gesture Recognition for Human-Computer Interaction

Authors: Ming-Hsuan Yang, Narendra Ahuja

Publisher: Springer US

Book Series : The International Series in Video Computing

Included in: Professional Book Archive

Login to get access

About this book

Traditionally, scientific fields have defined boundaries, and scientists work on research problems within those boundaries. However, from time to time those boundaries get shifted or blurred to evolve new fields. For instance, the original goal of computer vision was to understand a single image of a scene, by identifying objects, their structure, and spatial arrangements. This has been referred to as image understanding. Recently, computer vision has gradually been making the transition away from understanding single images to analyzing image sequences, or video understanding. Video understanding deals with understanding of video sequences, e. g. , recognition of gestures, activities, facial expressions, etc. The main shift in the classic paradigm has been from the recognition of static objects in the scene to motion-based recognition of actions and events. Video understanding has overlapping research problems with other fields, therefore blurring the fixed boundaries. Computer graphics, image processing, and video databases have obvious overlap with computer vision. The main goal of computer graphics is to gener ate and animate realistic looking images, and videos. Researchers in computer graphics are increasingly employing techniques from computer vision to gen erate the synthetic imagery. A good example of this is image-based rendering and modeling techniques, in which geometry, appearance, and lighting is de rived from real images using computer vision techniques. Here the shift is from synthesis to analysis followed by synthesis.

Advertisement

Table of Contents

Frontmatter

Chapter 1. Introduction

Abstract

This book is concerned with vision-based interfaces between man and machine. Various aspects of research on intelligent human computer interaction are addressed in the context of computer vision and machine learning.

Ming-Hsuan Yang, Narendra Ahuja

Chapter 2. Detecting Faces in Still Images

Abstract

Images containing faces are essential to intelligent vision-based human computer interaction, and research efforts in face processing include face recognition, face tracking, pose estimation, and facial expression recognition. However, many reported methods assume that the faces in an image or an image sequence have been identified and localized. To build fully automated systems that analyze the information contained in face images, robust and efficient face detection algorithms are required. Given a single image, the goal of face detection is to identify all image regions which contain a face regardless of its three-dimensional position, orientation, and the lighting conditions. Such a problem is challenging because faces are non-rigid and have a high degree of variability in size, shape, color, and texture. Numerous techniques have been developed to detect faces in a single image, and the purpose of this chapter is to categorize and evaluate these algorithms. We also discuss relevant issues such as data collection, evaluation metrics, and benchmarking. After analyzing these algorithms and identifying their limitations, we conclude with several promising directions for future research.

Ming-Hsuan Yang, Narendra Ahuja

Chapter 3. Recognizing Hand Gestures Using Motion Trajectories

Abstract

We present an algorithm for extracting and classifying two-dimensional motion in an image sequence based on motion trajectories. First, a multiscale segmentation is performed to generate homogeneous regions in each frame. Regions between consecutive frames are then matched to obtain 2-view correspondences. Affine transformations are computed from each pair of corresponding regions to define pixel matches. Pixels matches over consecutive images pairs are concatenated to obtain pixel-level motion trajectories across the image sequence. Motion patterns are learned from the extracted trajectories using a time-delay neural network. We apply the proposed method to recognize 40 hand gestures of American Sign Language. Experimental results show that motion patterns in hand gestures can be extracted and recognized with high recognition rate using motion trajectories.

Ming-Hsuan Yang, Narendra Ahuja

Chapter 4. Skin Color Model

Abstract

Human skin color has been used and proved to be an effective feature in many applications from human face detection to hand tracking. However, most studies use either simple thresholding or a single Gaussian distribution to characterize the properties of skin color. Although skin colors of different races fall into a small cluster in normalized RGB or HSV color space, we find that a single Gaussian distribution is neither sufficient to model human skin color nor effective in general applications. Further, previous approaches use small collections of images to estimate the density function but do not validate the models by verifying the statistical fit of the chosen model to the data. The work in this chapter is aimed at estimating the properties of human skin color using the Michigan face database (http://www.engin.umich.edu/faces/) which consists of 2,447 images of human faces from different ethnic groups. More than 9.5 million skin color pixels are used to build a skin color model.

Ming-Hsuan Yang, Narendra Ahuja

Chapter 5. Face Detection Using Multimodal Density Models

Abstract

We present two methods using multimodal density models for face detection in gray level images. One generative method uses a mixture of factor analyzers to concurrently perform clustering and, within each cluster, perform local dimensionality reduction. The parameters of the mixture model are estimated using the EM algorithm. A face is detected if the probability of an input sample is above a predefined threshold. The other discriminative method uses Kohonen’s self organizing map for clustering and Fisher’s Linear Discriminant to find an optimal projection for pattern classification, and a Gaussian distribution to model the class-conditional density function of the projected samples for each class. The parameters of the class-conditional density functions are maximum likelihood estimates, and the decision rule is also based on maximum likelihood. A wide range of face images including ones in different poses, with different expressions and under different lighting conditions are used as the training set to capture the variations of human faces. Our methods have been tested on three data sets with a total of 225 images containing 871 faces. Experimental results on the first two data sets show that our generative and discriminative methods perform as well as the best methods in the literature, yet have fewer false detections. Meanwhile, both methods are able to detect faces of non-frontal views and under more extreme lighting in the third data set.

Ming-Hsuan Yang, Narendra Ahuja

Chapter 6. Learning to Detect Faces with Snow

Abstract

A novel learning approach for face detection in still images using a network of linear units is presented. The SNoW learning architecture is a sparse network of linear functions over a pre-defined or incrementally learned feature space and is specifically tailored for learning in the presence of a very large number of features. A wide range of face images in different poses, with different expressions and under different lighting conditions are used as a training set to capture the variations of human faces. Experimental results on commonly used CMU data sets show that the SNoW-based approach perform well against methods that use neural networksneural network, Bayesian classifiers, Support Vector Machines and others. To quantify and explain the experimental results, we present a theoretical analysis that shows the advantage of this architecture and traces it to the nature of the update rule used in SNoW, a multiplicative update rule based on the Winnow learning algorithm. In particular, in sparse domains (in which the number of irrelevant features is large) this update rule is shown to be advantageous relative to algorithms that are derived from additive update rules such as Perceptron and Support Vector Machines. We show that learning problems in the visual domain have these sparseness characteristics and exhibit it by analyzing data taken from face detection experiments. Our experiments exhibit good generalization and robustness properties of the SNoW-based method, and conform to the theoretical analysis.

Ming-Hsuan Yang, Narendra Ahuja

Chapter 7. Conclusion and Future Work

Abstract

In this book, various aspects of research on intelligent human computer interaction are discussed in the context of computer vision and machine learning. In this chapter, we summarize the contributions of this work and sketch future research directions.

Ming-Hsuan Yang, Narendra Ahuja

Backmatter

Title: Face Detection and Gesture Recognition for Human-Computer Interaction
Authors: Ming-Hsuan Yang
Narendra Ahuja
Copyright Year: 2001
Publisher: Springer US
Electronic ISBN: 978-1-4615-1423-7
Print ISBN: 978-1-4613-5546-5
DOI: https://doi.org/10.1007/978-1-4615-1423-7