Elsevier

Information Sciences

Volume 192, 1 June 2012, Pages 50-70
Information Sciences

An adaptive classification system for video-based face recognition

https://doi.org/10.1016/j.ins.2010.02.026Get rights and content

Abstract

In many practical applications, new information may emerge from the environment at different points in time after a classification system has originally been deployed. For instance, in biometric systems, new data may be acquired and used to enroll or to update knowledge of an individual. In this paper, an adaptive classification system (ACS) is proposed for video-based face recognition. It combines a fuzzy ARTMAP neural network classifier, dynamic particle swarm optimization (DPSO) algorithm, and a long term memory (LTM). A novel DPSO-based learning strategy is also presented for incremental learning of new data with this ACS. This strategy allows to cojointly optimize the classifier weights, architecture, and user-defined hyperparameters such as classification rate is maximized. Performance of this system is assessed in terms of classification rate and resource requirements for incremental learning of data blocks coming from real-world video data bases. The necessity of a LTM to store validation data is shown empirically for different enrollment and update scenarios. In addition, incremental learning is shown to constitute a dynamic optimization problem where the optimal hyperparameter values change in time. Simulation results indicate that the proposed system can provide a significant higher classification rate than that of fuzzy ARTMAP alone during incremental learning. However, optimization of ACS parameters requires more resources. The ACS needs several training sequences to produce the optimal solution, and adapting fuzzy ARTMAP parameters according to classification rate tends to require more category neurons and training epochs.

Introduction

Biometric systems seek to recognize individuals from their behavioral or physiological characteristics such as the face, finger print, iris, signature and voice [24]. Since these characteristics are unique for each individual, and cannot be lost, stolen or reproduced, as with current approaches (e.g., passwords, access cards and identification numbers and cards), they can be used to prevent theft and fraud. There are three types of applications in biometric recognition – verification, identification, and surveillance [24]. In verification applications, an individual enrolled in the system identifies himself and provides a biometric sample. Then, the biometric system seeks to authenticate that the sample corresponds to the model of that specific individual. In contrast, in identification applications, an individual provides a biometric sample, and the system seeks to determine if the sample corresponds to the model of any of the individuals enrolled to the system. Surveillance applications differ slightly from identification in that the sampling process is performed discretely in an unconstrained scene, and it seeks to determine if a given biometric sample corresponds to the model of a restrained list of individuals under surveillance, e.g., screening for criminals or terrorists in an airport setting.

Over the past decade, face recognition has received considerable attention in the area of biometrics due to the wide range of commercial and law enforcement applications, and to the availability of affordable technologies. Video-based face recognition has the advantage other very reliable characteristics for biometric recognition, such as iris and fingerprint scans, that it does not require the cooperation of individuals involved in the process [45]. It can thus be used for surveillance applications where control of the acquisition conditions are not possible. In addition, unlike applications of image-based face recognition, it is possible to recognize targeted subjects from a sequence of video frames, instead of only one image. As outlined in the following, video-based face recognition for surveillance applications remains a very challenging problem.

A critical function in face recognition systems is the classification of face regions captured in video streams. Typically, face recognition systems employ statistical or neural pattern classifiers to map an RI input feature space to a set of K predefined class labels Ω={C1,C2,,CK}, where each class k (k=1,,K) corresponds to the face model of an individual enrolled in the biometric system. From the classifier’s perspective, an input pattern a associated with class k is sampled from an unknown probability distribution, pk(a), over the input feature space RI. In practical applications, the classifiers are designed a priori, using some prior knowledge of the underlying distributions pk(a), a set of user-defined hyperparameters (e.g., learning parameter), and a limited amount of learning data.

Since the acquisition (collection and analysis) of such data is expensive and time consuming in many practical applications, it may therefore be incomplete in one of several ways. In static classification environments, where pk(a) remain fixed over time, these include a limited number of learning samples, missing components of the input observations, missing class labels during learning, and unfamiliar classes (not present in the learning data set) [20]. Moreover, in video-surveillance applications, learning samples acquired from video streams of unconstrained scenes are generally of poor quality with low resolution. They are also subject to considerable variations due to limited control over operational conditions (e.g., illumination, pose, facial expression, orientation and occlusion). These challenges translate to very complex class distributions pk(a), mainly due to inter and intraclass variability. In addition to previously mentioned challenges, an individual’s physiology may change over time, either temporarily (e.g., haircut, glasses, etc.) or permanently (e.g., ageing). In the RI space, new informations, such as input features and output classes, may suddenly emerge, and previously acquired data may eventually become obsolete in dynamic classification environments, where class distributions pk(a,t) vary or drift in time [20], [40], [43]. The overall result is a divergence between the biometric models learned by a classifier and the underlying distributions pk(a,t) which may significantly degrade performance.

Although learning data is limited, it is common to acquire new data at some point in time after the classifier has originally been trained and deployed for operations. In particular, adaptation of video-based face recognition systems is required during enrollment (new classes are added to the system) and during update (pre-existing classes are refined using the new data). To avoid a growing divergence with the underlying class distributions pk(a,t), the system should then efficiently adapt its face models as new learning data and knowledge becomes available.

The majority of statistical and neural pattern classifiers proposed in literature perform supervised batch learning of a finite data set, and assume a static classification environment. To account for new data, they must accumulate all cumulative data in memory and train from the start using all previously acquired learning data. Otherwise, new data may corrupt the classifier’s previously acquired knowledge, and compromise its ability to achieve a high level of generalization during future operations. The memory and time complexity associated with storing and relearning from the start on all cumulative data is not feasible for several practical applications. Assuming that new learning data is available, a classifier that allows for supervised incremental learning should (1) allow learning of additional information from new data, (2) not require access to the previous learning data, (3) preserve previously acquired knowledge,1 and (4) accommodate new classes that may be introduced with the new data [38]. Some classifiers proposed in literature are inherently able to perform supervised incremental learning: the Growing Self-Organizing Networks [15] and the ARTMAP Networks [7]. Other well known neural networks (MLP, SVM, and RBF) have also been modified to perform such learning [8], [36], [39]. In response to new learning data, these classifiers adapt their parameters (e.g., synaptic weights for a neural network) and architecture according to these four incremental learning properties.

In order to mitigate corruption of previous knowledge when learning new data (3rd property), a 5th property should be considered for incremental learning – the classifier should (5) adapt its learning dynamics by adjusting its hyperparameters for accurate and timely recognition. In an unconstrained scene and dynamic classification environment, changes in the feature space are likely to occur over time, and re-adjustment of the classifier hyperparameters are needed. Incremental learning is then defined as a dynamic optimization problem in the hyperparameters space. Furthermore, the authors have shown in [10] that, unlike by the 2nd property stated, it is necessary to preserve some learning data for the validation process and fitness estimation. If not, adaptation is only performed according to new data, and the classifier is subject to the problem of catastrophic forgetting.

In this paper, an adaptive classification system (ACS) is proposed for video-based face recognition. It combines a fuzzy ARTMAP neural network classifier suitable for incremental learning [6], and a dynamic particle swarm optimization (DPSO) algorithm capable of finding and tracking several local optima in the optimization space [35]. This system also features a long term memory (LTM) used to store and manage a set of data for cross-validation and unbiased estimation of classification rate. A novel DPSO-based learning strategy is also proposed for incremental learning of new data with this ACS. When new data becomes available, this strategy allows to cojointly optimize the classifier weights, architecture, and user-defined hyperparameters such as classification rate is maximized.

This study focuses on video-based face recognition applications in which two incremental learning scenarios may occur – enrollment and update. Performance of this system is assessed in terms of classification rate and resource requirements for incremental learning of new data blocks from two real-world video data sets – IIT-NRC [17] and Motion of Body (MoBo) [21]. First, the necessity of storing validation data in LTM is observed empirically by comparing the performances of fuzzy ARTMAP network trained (1) by using standard hyperparameter values, and (2) by optimizing hyperparameters on each new data block, in both cases, with and without LTM. Second, dynamic changes in the fuzzy ARTMAP hyperparameters space are shown to occur in both scenarios during incremental learning. Performance is compared for fuzzy ARTMAP networks trained by optimizing hyperparameters on all new data blocks with (1) dynamic optimization, (2) static optimization, (3) canonical particle swarm optimization, and (4) only on the first data block.

In the next section, a general biometric system for face recognition system is presented. Then, in Section 3, a description of the adaptive classification system is presented, along with the long term memory used to store and manage validation data, the fuzzy ARTMAP neural network used for classification, and the DPSO algorithm used to optimize its hyperparameters. Then, the data bases, incremental learning scenarios, performance measures and the protocol used for proof-of-concept simulations are described in Section 4. Finally, experimental results are presented and discussed in Section 5.

Section snippets

Biometrics and face recognition from video sequences

The adaptive classification system proposed in this paper is applied to the recognition of faces in video streams of a video-surveillance application and replaces the classification module and biometric data base of Fig. 1. However, it can also be employed to a wide range of real-world pattern recognition applications in which complex and changing environments are modelled using neural and statistical classifiers, but where learning data is limited. In face recognition applications, it is

Adaptive classification system

Fig. 2 depicts the evolution of the adaptive classification system (ACS) proposed in this paper for supervised incremental learning of new data. This novel system is composed of a pattern classifier that is suitable for supervised incremental learning, a dynamic optimization module that tunes the user-defined hyperparameters of the classifier, and a long term memory (LTM) that manages and stores incoming learning data used for validation and fitness evaluation.

When a new block of learning data D

Video data bases

In order to observe the impact on system performance of supervised incremental learning, proof-of-concept simulations are performed with two real-world video data bases for face recognition. The first data base was collected by the Institute for Information Technology of the Canadian National Research Council (IIT-NRC) [17]. It is composed of 22 video sequences captured from 11 individuals positioned in front of a computer. For each individual, two color video sequences of about 15 s are

Experiment (A) – Impact of the LTM for validation data

Fig. 5, Fig. 6 present the average classification rate, compression, and convergence time achieved by the ACS with and without LTM data, and for hyperparameters that are re-optimized (hro(t)) and standard hyperparameters (hstd), during both incremental learning scenarios. For reference, performance is also shown for hyperparameters re-optimized during batch learning hroB(t) and kNN. Table 3, Table 5 show an example of the average confusion matrix for only one of the five class presentation

Conclusion

In this paper, an adaptive classification system (ACS) is proposed for video-based face recognition. It combines a fuzzy ARTMAP neural network classifier, dynamic particle swarm optimization (DPSO) algorithm, and a long term memory (LTM). This ACS uses a novel DPSO-based learning strategy to cojointly optimize the classifier weights, architecture, and user-defined hyperparameters such as classification rate is maximized during incremental learning of new data. This DPSO-based learning strategy

Acknowledgements

This research was supported in part by the Natural Sciences and Engineering Research Council of Canada. We also wish to thank the reviewers of this paper for their constructive comments.

References (46)

  • T. Blackwell, J. Branke, Multi-swarm optimization in dynamic environments, in: Applications of Evolutionary Computing,...
  • A. Canuto et al.

    An investigation of the effects of variable vigilance within the RePART neuro-fuzzy network

    Journal of Intelligent and Robotic Systems: Theory and Applications

    (2000)
  • A. Carlisle, G. Dozier, Tracking changing extrema with adaptive particle swarm optimizer, in: World Automation...
  • G.A. Carpenter et al.

    Fuzzy ARTMAP: a neural network architecture for incremental supervised learning of analog multidimensional maps

    IEEE Transactions on Neural Networks

    (1992)
  • D. Chakraborty et al.

    A novel training scheme for MLPs to realize proper generalization and incremental learning

    IEEE Transactions on Neural Networks

    (2003)
  • J.-F. Connolly, E. Granger, R. Sabourin, Supervised incremental learning with the fuzzy ARTMAP neural network, in:...
  • J.-F. Connolly, E. Granger, R. Sabourin, Incremental adaptation of fuzzy ARTMAP neural networks for video-based face...
  • A.P. Engelbrecht

    Fundamental of Computational Swarm Intelligence

    (2005)
  • G.L. Foresti, L. Snidaro, A distributed sensor network for video surveillance of outdoor environments, in: IEEE Proc....
  • B. Fritzke, Growing self-organizing networks – why? in: Proc. of the European Symposium on Artificial Intelligence,...
  • D.O. Gorodnichy, Video-based framework for face recognition in video, in: Second Workshop on Face Processing in Video...
  • E. Granger, J.-F. Connolly, R. Sabourin, A comparison of fuzzy ARTMAP and gaussian ARTMAP neural networks for...
  • E. Granger et al.

    Supervised learning of fuzzy ARTMAP neural networks through particle swarm optimization

    Journal of Pattern Recognition Research

    (2007)
  • Cited by (0)

    View full text