Elsevier

Neurocomputing

Volume 73, Issues 4–6, January 2010, Pages 827-839
Neurocomputing

Incremental tensor biased discriminant analysis: A new color-based visual tracking method

https://doi.org/10.1016/j.neucom.2009.10.013Get rights and content

Abstract

Most existing color-based tracking algorithms utilize the statistical color information of the object as the tracking clues, without maintaining the spatial structure within a single chromatic image. Recently, the researches on the multilinear algebra provide the possibility to hold the spatial structural relationship in a representation of the image ensembles. In this paper, a third-order color tensor is constructed to represent the object to be tracked. Considering the influence of the environment changing on the tracking, the biased discriminant analysis (BDA) is extended to the tensor biased discriminant analysis (TBDA) for distinguishing the object from the background. At the same time, an incremental scheme for the TBDA is developed for the tensor biased discriminant subspace online learning, which can be used to adapt to the appearance variant of both the object and background. The experimental results show that the proposed method can track objects precisely undergoing large pose, scale and lighting changes, as well as partial occlusion.

Introduction

Visual tracking is an important and essential component of visual perception, and has been an active research topic in computer vision community for decades. Influenced by the environment change and object motion, the appearance of the object takes on variety and variability, which is a challenge for describing the object effectively.

Many works have been developed for visual tracking to extract various low-level features (e.g., color [1], [3], [7], shape [2], [6], texture and contour [4], [5]), and build object appearance model (e.g., spatial histogram [1], [2], [7], AAMs [10], and subspace method [11], [12], [13], [14], [15], [16], [17]). Birchfield [1] and Wang et al. [2] presented the facial model with integration of shape and color. Hayashi and Fujiyoshi [3] developed color tracking method based on meanshift in luminance change. Isard and Blake [4] proposed a conditional density propagation of a parametric spine curve. Cootes et al. [6] employed the combination of shape with appearance representations for tracking. Perez et al. [7] presented multi-part color modeling to capture a rough spatial layout ignored by global histograms, without taking the background color into account. Comaniciu et al. [8] proposed a new object tracking based on kernel method and meanshift algorithm. Avidan [9] developed an ensemble tracking by using AdaBoost to distinguish the object from the background. These methods above usually utilize the low-level features of the image, but ignore the high-level semantic knowledge. Moreover, these methods often assume that the object takes on consistency and similarity in respect of the texture, gradient and so on, and obtain the segmentation-like tracking results. However, the corresponding locations/pixels within these segmentation-like tracked image patches usually have not coherent sense in the context.

For the purpose of getting more accurate results, a series of appearance-based methods were developed recently. Gross et al. [10] employed AAMs to track face efficiently in videos containing occlusion, but a complicated training process is inevitable before the tracking. The subspace-based methods were developed recently and applied to many research areas widely [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34], [35]. Based on the success of EigenTracking and incremental extension of the subspace learning, Ross et al. [11] presented an adaptive probabilistic visual tracking by updating subspace incrementally. Lin et al. [12] proposed a discriminative generative model for visual tracking. On the basis of Lin's work, Shen et al. [13] developed a kernelized version for tracking. Yuan et al. [32] employed the incremental principal components analysis to scene segmentation for visual surveillance. Due to the limitation of this image-as-vector way to build subspace, a new representation based on high order tensor has drawn many researchers’ interest, and been introduced into the tracking and recognition. Tao et al. [18], [33] introduced the multilinear representation for gait recognition. Moreover, Sun et al. [24] proposed incremental tensor analysis to deal with the learning of dynamic/online data. Later, series of discriminant methods [21], [27], [28], [31] for tensor analysis and their applications in retrieval [21], [31], video semantic [30], gait recognition [28], [33] were developed. Tao et al. [23], [25] proposed a Bayesian tensor analysis method and applied it to 3-D face modeling, as well as the kernelization [22] and probabilistic [26] version. Li and Lee [14] presented a motion saliency-based visual tracking. Shao et al. [15] developed an appearance-based method using the three-dimensional trilinear tensor. Li et al. [16] employed a three-dimensional temporal tensor subspace learning for visual tracking. It should be noticed that those appearance-based tracking algorithms above have the following characteristics:

  • They only make use of the intensity information [5], [11], [12], [13], [15], [16], [17], which will lose chromatic knowledge in color video sequences.

  • They usually regard the visual tracking as a two-class fisher discriminant problem [12], [13], whereas the classification between the object and the background should be a (1+x)-class formulation.

In visual tracking, the object belongs to the positive sample set, while the background belongs to the negative sample set relative to the object of interest. Though the appearances of both the object and background vary with time, the variant between the object and background is so different. Since there are only pose, scale and illumination changes for the object, the changing appearances of the object are similar in some degree during the tracking, which can be regarded to a class as the blue symbols shown in Fig. 1. However, with the object moving the background changes drastically, which is shown in Fig. 1 with the orange symbols. Therefore, it is unreasonable to assign the background to one class. With this consideration, biased discriminant analysis (BDA) was developed by Zhou and Huang [19], [20], which also leads to the small sample size (SSS) problem. Tao et al. [21] proposed a direct kernel biased discriminate analysis to deal with the SSS problem, and to provide a relevance feedback scheme for content-based image retrieval.

However, these discriminant analysis methods still hold the manner of the image-as-vector representation, and lose the spatial structure of the two-dimensional image. The SSS problem could be coped with by introducing the tensor representation [22]. In this paper, we propose a tensor biased discriminant analysis (TBDA) which could solve the SSS problem existed in the BDA, develop an online learning scheme for the TBDA named as incremental tensor biased discriminant analysis (ITBDA), and present a third-order color tensor-based visual tracking by employing the ITBDA to distinguish the objects from background. The contributions of this paper can be summarized as follows: (1) provide a new appearance construction method by integrating the spatial color information into a third-order tensor representation, which is more distinguishable for the appearance model; (2) propose a tensor biased discriminant analysis (TBDA), which is the generalization of the BDA for tensor representation, and able to deal with the SSS caused by the vectorization of the BDA; (3) present an incremental tensor biased discriminant analysis (ITBDA) suitable for online distinguishing the objects from the object-like background.

The reminder of this paper is organized as follows: the related previous work, say LDA and BDA, is described in Section 2. The TBDA and ITBDA are then proposed in Section 3. In Section 4, a color-based visual tracking algorithm is presented. Section 5 conducts several experiments to validate the effectiveness of the proposed tracking method undergoing large pose, scale and lighting changes for the single-object tracking, as well as the partial occlusion for the multiple-object tracking. The final section draws conclusions and future works.

Section snippets

Previous works

In this section, previous works are introduced including the linear discriminant analysis (LDA) and biased discriminant analysis (BDA).

Incremental tensor biased discriminant analysis

In this section, a new supervised subspace method—tensor biased discriminant analysis (TBDA), the generalization of the biased discriminant analysis (BDA) [19] by introducing tensor representation, is developed to distinguish the positive and negative classes and mainly focus on the class of interest. Due to tensor representation makes full use of the structure information of the object, which is a reasonable constraint [22] to reduce the number of the unknown parameters used to represent a

ITBDA-based visual tracking

Visual tracking could be formulated as a classification problem between the object and background observations, which are encoded with the locations or motion parameters of the objects through the unobservable states, and the task is to infer the unobservable states from the observed images over time.

Experimental results

In order to evaluate the performance of the proposed tracking algorithm, we collected eight videos with human face and pedestrians as the tracking objects, where the first three video sequences are captured indoor undergoing large pose variant and drastic illumination, the last five video sequences1 are recorded in shopping center in Portugal.

We test five and three video sequences for the single- and multiple-objects tracking, respectively, to

Conclusion and future work

This paper proposes a new color-clue-based visual tracking method, which can incrementally learn the object color structure information and discriminate the object from the background in online way. For this application purpose, we extend the biased discriminant analysis from the vector-based method to tensor-based way in order to keep the structure information well combine with the color information, present batch tensor biased discriminant analysis (TBDA) and its incremental version, ITBDA,

Acknowledgements

We want to thank the helpful comments and suggestions from the anonymous reviewers. This research was supported by the National Natural Science Foundation of China (60771068, 60702061, 60832005), the Open Project Program of the National Laboratory of Pattern Recognition (NLPR) in China and the National Laboratory of Automatic Target Recognition, Shenzhen University, China.

Jing Wen received the B.Sc. degree in Electronic Information Science and Technology from Shanxi University, Taiyuan, China, in 2003, and the M.Eng. degree in Signal and Information Processing from Xidian University, Xi’an, China, in 2006. Since August 2006, she has been pursuing her Ph.D. degree in Pattern Recognition and Intelligent System at Xidian University. Her research interests include pattern recognition and computer vision.

References (35)

  • M. Isard, A. Blake, Contour tracking by stochastic propagation of conditional density, in: The Fourth European...
  • T. Cootes et al.

    Robust real-time periodic motion detection analysis and applications

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2001)
  • P. Perez, C. Hue, J. Vermaak, M. Gangnet, Color-based probabilistic tracking, in: The Seventh European Conference on...
  • D. Comaniciu et al.

    Kernel-based object tracking

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2003)
  • S. Avidan, Ensemble tracking, in: Proceedings of the 2005 IEEE International Conference on Computer Vision and Pattern...
  • D. Ross, J. Lim, M.H. Yang, Adaptive probabilistic visual tracking with incremental subspace update, in: The Eighth...
  • R.S. Lin et al.

    Adaptive discriminative generative model and its applications

    Advances in Neural Information Processing Systems

    (2004)
  • Cited by (34)

    • Ensemble based multi-linear discriminant analysis with boosting and nearest neighbor

      2012, Scientia Iranica
      Citation Excerpt :

      When the number of training samples is comparable with the input dimensions, LDA is faced with the small sample size problem [12]. Recently, tensor based methods have attracted a great deal of interest [12–18], due to a high recognition rate and a natural representation of input objects in the original format of tensors. Tensor representation makes ultimate use of the underlying structure information of objects.

    • Applying mean shift, motion information and Kalman filtering approaches to object tracking

      2012, ISA Transactions
      Citation Excerpt :

      If the tracked object is lost or occluded completely, then the tracking algorithm will be unsatisfied, logically. As a result, deterministic methods find it hard to handle occlusion problems [10]. In this research, the proposed approach is efficient in covering both categories simultaneously.

    View all citing articles on Scopus

    Jing Wen received the B.Sc. degree in Electronic Information Science and Technology from Shanxi University, Taiyuan, China, in 2003, and the M.Eng. degree in Signal and Information Processing from Xidian University, Xi’an, China, in 2006. Since August 2006, she has been pursuing her Ph.D. degree in Pattern Recognition and Intelligent System at Xidian University. Her research interests include pattern recognition and computer vision.

    Xinbo Gao received the B.Sc., M.Sc. and Ph.D. degrees in signal and information processing from Xidian University, China, in 1994, 1997 and 1999, respectively. From 1997 to 1998, he was a research fellow in the Department of Computer Science at Shizuoka University, Japan. From 2000 to 2001, he was a postdoctoral research fellow in the Department of Information Engineering at the Chinese University of Hong Kong. Since 2001, he joined the School of Electronic Engineering at Xidian University. Currently, he is a Professor of Pattern Recognition and Intelligent System, and Director of the VIPS Lab, Xidian University. His research interests are computational intelligence, machine learning, computer vision, pattern recognition and artificial intelligence. In these areas, he has published 4 books and around 100 technical articles in refereed journals and proceedings including IEEE TIP, TCSVT, TNN, TSMC, etc. He is on the editorial boards of journals including EURASIP Signal Processing (Elsevier) and Neurocomputing (Elsevier). He served as general chair/co-chair or program committee chair/co-chair or PC member for around 30 major international conferences.

    Yuan Yuan is currently a Lecturer with the School of Engineering and Applied Science, Aston University, United Kingdom. She received her B.Eng. degree from the University of Science and Technology of China, China, and the Ph.D. degree from the University of Bath, United Kingdom. She has over 60 scientific publications in journals and conferences on visual information processing, compression, retrieval, etc. She is an associate editor of International Journal of Image and Graphics (World Scientific), an editorial board member of Journal of Multimedia (Academy Publisher), a guest editor of Signal Processing (Elsevier), and a guest editor of Recent Patents on Electrical Engineering. She was a chair of some conference sessions, and a member of program committees of many conferences. She is a reviewer for several IEEE transactions, other international journals and conferences.

    Dacheng Tao received the B.Eng. degree from the University of Science and Technology of China (USTC), the M.Phil. degree from the Chinese University of Hong Kong (CUHK), and the Ph.D. degree from the University of London (Lon). Currently, he is a Nanyang Assistant Professor with the School of Computer Engineering in the Nanyang Technological University and holds a visiting post in Lon. He is a Visiting Professor in the Xi Dian University and a Guest Professor in the Wu Han University. His research is mainly on applying statistics and mathematics for data analysis problems in computer vision, multimedia, machine learning, data mining, and video surveillance. He has published more than 100 scientific papers including IEEE TPAMI, TIP, TKDE, CVPR, ECCV, NIPS, ICDM; ACM TKDD, Multimedia, KDD, etc., with best paper runner up awards and finalists. One of his TPAMI papers received an interview with ScienceWatch.com (Thomson Scientific). His H-Index in google scholar is 14 and his Erdös number is 3. He holds the K.C. WONG Education Foundation Award.

    Jie Li received the B.Sc., M.Sc. and Ph.D. degrees in Circuit and System from Xidian University, China, in 1995, 1998 and 2005, respectively. Since 1998, she joined the School of Electronic Engineering at Xidian University. Currently, she is an Associate Professor of Xidian University. Her research interests include computational intelligence, machine learning, and image processing. In these areas, she has published over 30 technical articles in refereed journals and proceedings including IEEE TCSVT, IJFS, etc.

    View full text