Skip to main content

2001 | Buch

Face Image Analysis by Unsupervised Learning

verfasst von: Marian Stewart Bartlett

Verlag: Springer US

Buchreihe : The International Series in Engineering and Computer Science

insite
SUCHEN

Über dieses Buch

Face Image Analysis by Unsupervised Learning explores adaptive approaches to image analysis. It draws upon principles of unsupervised learning and information theory to adapt processing to the immediate task environment. In contrast to more traditional approaches to image analysis in which relevant structure is determined in advance and extracted using hand-engineered techniques, Face Image Analysis by Unsupervised Learning explores methods that have roots in biological vision and/or learn about the image structure directly from the image ensemble. Particular attention is paid to unsupervised learning techniques for encoding the statistical dependencies in the image ensemble.
The first part of this volume reviews unsupervised learning, information theory, independent component analysis, and their relation to biological vision. Next, a face image representation using independent component analysis (ICA) is developed, which is an unsupervised learning technique based on optimal information transfer between neurons. The ICA representation is compared to a number of other face representations including eigenfaces and Gabor wavelets on tasks of identity recognition and expression analysis. Finally, methods for learning features that are robust to changes in viewpoint and lighting are presented. These studies provide evidence that encoding input dependencies through unsupervised learning is an effective strategy for face recognition.
Face Image Analysis by Unsupervised Learning is suitable as a secondary text for a graduate-level course, and as a reference for researchers and practitioners in industry.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Summary
Abstract
One of the challenges of teaching a computer to recognize faces is that we do not know a priori which features and which high order relations among those features to parameterize. Our insight into our own perceptual processing is limited. For example, image features such as the distance between the eyes or fitting curves to the eyes give only moderate performance for face recognition by computer. Much can be learned about image recognition from biological vision. A source of information that appears to be crucial for shaping biological vision is the statistical dependencies in the visual environment. This information can be extracted through unsupervised learning1. Unsupervised learning finds adaptive image features that are specialized for a class of images, such as faces.
Marian Stewart Bartlett
Chapter 2. Introduction
Abstract
How can a perceptual system learn to recognize properties of its environment without being told which features it should analyze, or whether its decisions are correct? When there is no external teaching signal to be matched, some other goal is required to force a perceptual system to extract underlying structure. Unsupervised learning is related to Gibson’s concept of discovering “affordances” in the environment (Gibson, 1986). Structure and information are afforded by the external stimulus, and it is the task of the perceptual system to discover this structure. The perceptual system must learn about the underlying physical causes of observed images. One approach to self-organization is to build generative models that are likely to have produced the observed data. The parameters of these generative models are adjusted to optimize the likelihood of the data within constraints such as basic assumptions about the model architecture. A second class of objectives is related to information preservation and redundancy reduction. These approaches are reviewed here. The two approaches to unsupervised learning are not mutually exclusive, and it is often possible, as will be seen below, to ascribe a generative architecture to an information preservation objective, and to build generative models with objectives of information preservation. See (Becker and Plumbley, 1996) for a thorough discussion of unsupervised learning. Hinton and Sejnowski’s Unsupervised Learning: Foundations of Neural Computation (Hinton and Sejnowski, 1999) contains an anthology of many of the works reviewed in this chapter. A recommended background text is Dana Ballard’s Introduction to Natural Computation (Ballard, 1997).
Marian Stewart Bartlett
Chapter 3. Independent Component Representations for Face Recognition
Abstract
In a task such as face recognition, much of the important information may be contained in the high-order relationships among the image pixels. A number of face recognition algorithms employ principal component analysis (PCA), which is based on the second-order statistics of the image set, and does not address high-order statistical dependencies such as the relationships among three or more pixels. Independent component analysis (ICA) is a generalization of PCA which separates the high-order moments of the input in addition to the second-order moments. ICA was performed on a set of face images by an unsupervised learning algorithm derived from the principle of optimal information transfer through sigmoidal neurons (Bell and Sejnowski, 1995). The algorithm maximizes the mutual information between the input and the output, which produces statistically independent outputs under certain conditions. ICA was performed on the face images under two different architectures, one which separated images across spatial location, and a second which separated the feature code across images. The first architecture provided a statistically independent basis set for the face images that can be viewed as a set of independent facial feature images. The second architecture provided a factorial code, in which the probability of any combination of features can be obtained from the product of their individual probabilities. Both ICA representations were superior to representations based on principal components analysis for recognizing faces across days and changes in expression.
Marian Stewart Bartlett
Chapter 4. Automated Facial Expression Analysis
Abstract
The ability to recognize facial signals is essential for natural communication between humans and until recently has been absent from the computer. Within the past decade, significant advances have enabled computer systems to understand and use this natural form of human communication. Because most investigators have limited their analysis to a small set of posed expressions, the generalizability of these systems to real world applications is low. Here we present an approach to automatic facial expression analysis based on the Facial Action Coding System (FACS). This system objectively measures facial expressions by decomposing them into component actions. FACS is presently performed by expert human observers, not computers. An automated facial action coding system will have a wide range of applications in behavioral science, medicine, and human-computer interaction. This chapter reviews the state of the art in automated facial expression analysis, describes the Facial Action Coding System, and outlines our approach to automating FACS.
Marian Stewart Bartlett
Chapter 5. Image Representations for Facial Expression Analysis: Comparative Study I
Abstract
Facial expressions provide an important behavioral measure for the study of emotion, cognitive processes, and social interaction. The Facial Action Coding System, (Ekman and Friesen, 1978), is an objective method for quantifying facial movement in terms of component actions. We applied computer image analysis to the problem of automatically detecting facial actions in sequences of images. In our first study we compared three approaches: Holistic spatial analysis (eigenfaces), explicit measurement of features such as wrinkles, and estimation of motion flow fields. The three methods were combined in a hybrid system which classified six upper facial actions with 91% accuracy, including low, medium, and high magnitude facial actions. The hybrid system outperformed human non-experts on this task, and performed as well as highly trained experts. These comparisons supported the theory that unsupervised feature extraction based on dependencies in the image ensemble is more effective for face image analysis than explicit measurement of facial features.
Marian Stewart Bartlett
Chapter 6. Image Representations for Facial Expression Analysis: Comparative Study II
Abstract
The Facial Action Coding System (FACS) (Ekman and Friesen, 1978) is an objective method for quantifying facial movement in terms of component actions. This system is widely used in behavioral investigations of emotion, cognitive processes, and social interaction. The coding is presently performed by highly trained human experts. This chapter explores and compares techniques for automatically recognizing facial actions in sequences of images. These techniques include analysis of facial motion through estimation of optical flow; holistic spatial analysis such as principal component analysis, independent component analysis, local feature analysis, and linear discriminant analysis; and methods based on the outputs of local filters, such as Gabor wavelet representations, and local principal components. Performance of these systems is compared to naive and expert human subjects. Best performances were obtained using the Gabor wavelet representation and the independent component representation, both of which achieved 96% accuracy for classifying twelve facial actions of the upper and lower face. The results provide converging evidence for the importance of local filters, high spatial frequencies, and high-order dependencies for classifying facial actions.
Marian Stewart Bartlett
Chapter 7. Learning Viewpoint Invariant Representations of Faces in an Attractor Network
Abstract
In natural visual experience, different views of an object or face tend to appear in close temporal proximity as an animal manipulates the object or navigates around it, or as a face changes expression or pose. A set of simulations is presented which demonstrate how viewpoint invariant representations of faces can be developed from visua1 experience by capturing the temporal relationships among the input patterns. The simulations explored the interaction of temporal smoothing of activity signals with Hebbian learning (Földiák, 1991) in both a feedforward layer and a second, recurrent layer of a network. The feedforward connections were trained by Competitive Hebbian Learning with temporal smoothing of the post-synaptic unit activities (Bartlett and Sejnowski, 1996b). The recurrent layer was a generalization of a Hopfield network with a lowpass temporal filter on all unit activities. The combination of basic Hebbian learning with temporal smoothing of unit activities produced an attractor network learning rule that associated temporally proximal input patterns into basins of attraction. These two mechanisms were demonstrated in a model that took gray level images of faces as input. Following training on image sequences of faces as they changed pose, multiple views of a given face fell into the same basin of attraction, and the system acquired representations of faces that were approximately viewpoint invariant.
Marian Stewart Bartlett
Chapter 8. Conclusions and Future Directions
Abstract
Horace Barlow has argued that redundancy in the sensory input contains structural information about the environment. Completely non-redundant stimuli are indistinguishable from random noise, and the percept of structure is driven by the dependencies (Barlow, 1989). According to Barlow’s theory, what is important for a system to be able to detect is new regularities that differ from the environment to which the system has been adapted. These are what Barlow refers to as “suspicious coincidences.” Learning mechanisms that encode the dependencies that are expected in the input and remove them from the output better enable a system to detect these new regularities in the environment. Independence facilitates the detection of high-order relationships that characterize an object because the prior probability of any particular high order combination of features is low. Incoming sensory stimuli are automatically compared against the null hypothesis of statistical independence, and suspicious coincidences signaling a new causal factor can be more reliably detected. A number of unsupervised learning algorithms have been devised that attempt to learn the structure of the input by employing an objective of reducing statistical dependencies between coding elements.
Marian Stewart Bartlett
Backmatter
Metadaten
Titel
Face Image Analysis by Unsupervised Learning
verfasst von
Marian Stewart Bartlett
Copyright-Jahr
2001
Verlag
Springer US
Electronic ISBN
978-1-4615-1637-8
Print ISBN
978-1-4613-5653-0
DOI
https://doi.org/10.1007/978-1-4615-1637-8