Elsevier

Signal Processing

Volume 104, November 2014, Pages 248-257
Signal Processing

Energy-based model of least squares twin Support Vector Machines for human action recognition

https://doi.org/10.1016/j.sigpro.2014.04.010Get rights and content

Highlights

  • We have extended the LS-TSVM algorithm to the energy based model called ELS-TSVM for human action recognition.

  • The energy for each hyperplane has been introduced to be flexible in the face of outliers of each action.

  • ELS-TSVM performed several orders of magnitude faster than SVM.

  • SVM leads to the unbalance dataset problem in multi-class classification. But, ELS-TSVM addresses it.

Abstract

Human action recognition is an active field of research in pattern recognition and computer vision. For this purpose, several approaches based on bag-of-word features and support vector machine (SVM) classifiers have been proposed. Multi-category classifications of human actions are usually performed by solving many one-versus-rest binary SVM classification tasks. However, it leads to the class imbalance problem. Furthermore, because of environmental problems and intrinsic noise of spatio-temporal features, videos of similar actions may suffer from huge intra-class variations. In this paper, we address these problems by introducing the Energy-based Least Square Twin Support Vector Machine (ELS-TSVM) algorithm. ELS-TSVM is an extended LS-TSVM classifier that performs classification by using two nonparallel hyperplanes instead of a single hyperplane, as used in the conventional SVM. ELS-TSVM not only could consider the different energy for each class but also it handles unbalanced datasets׳ problem. We investigate the performance of the proposed methods on Weizmann, KTH, Hollywood, and ten UCI datasets which have been extensively studied by research groups. Experimental results show the effectiveness and validity of noise handling in human action and UCI datasets. ELS-TSVM has also obtained superior accuracy compared with the related methods while its time complexity is remarkably lower than SVM.

Introduction

Human action recognition is one of the important research areas in computer vision and pattern recognition. It has a wide range of applications such as surveillance systems, human computer interaction, video retrieval, and gesture recognition. In the past decade, with growing in video quality and personal video recording, the need to automatic video analysis and the recognition of events has bean increased. The difficulty of human action recognition problems may have been originated from several challenges such as illumination changes, partial occlusions, and intra-class differences [1].

Recently, Bag of Words (BoWs) representation and support vector machine (SVM) for human action recognition have attracted much interest [2], [3], [4]. Accordingly, the feature descriptors are extracted from all the training sequences to build a codebook by clustering similar features. The cluster centroids, called as video words, are the members of this codebook. Each feature descriptor is assigned to a certain video word (cluster centroid). An action video is represented as a histogram of the number of occurrences of particular video words. Then, classification methods are exploited to build models for each action class.

The support vector machine was originally proposed by Cortes and Vapnik [5] for the purpose of binary classification. SVM has been successfully applied in a wide spectrum of research areas like face recognition, object categorization, and biomedicine [6], [7], [8], [9]. The computational complexity of SVM is O(l3), where l denotes the total size of training data. However, this drawback restricts the application of SVM to large-scale problem domains. Since the optimal hyperplane obtained by SVM depends on only a small part of samples (support vectors), it is very sensitive to the outliers and noisy samples. Moreover, multi-category classification of human actions is usually done by solving many one-versus-rest binary SVM classification tasks. Each binary SVM is trained with all of the patterns, so it easily leads to the class imbalance problem.

To deal with these issues, we propose a fast classifier to understand activity recognition based on Twin Support Vector Machines (TSVM). TSVM were proposed by Jayadeva et al. in [10] for binary classification. This method generates two nonparallel hyperplanes by solving two smaller-sized Quadratic Programming Problems (QPPs) such that each hyperplane is closer to one class and as far as possible from the other. The idea of solving two smaller-sized QPPs rather than a single larger-sized QPP in SVM makes the learning of TSVM four times faster than the conventional SVM [10]. Least Squares Twin Support Vector Machine (LS-TSVM) [11] is an extension of TSVM as a way to replace the convex QPPs in TSVM with a convex linear system by using a squared loss function instead of the hinge one. This formulation leads to the extremely simple and fast algorithm. The constraints of the LS-TSVM are converted to an energy model which could reduce the adverse effects of noisy data and outliers. In addition, in one-versus-rest protocol of ELS-TSVM for multi-class classification, imbalance datasets will not affect the model learning.

The paper is organized as follows: we first review the related work in Section 2. In Section 3, we describe the proposed human action recognition framework and introduce the ELS-TSVM. In Section 4, the experimental results on common datasets are given. Finally, Section 5 contains concluding remarks.

Section snippets

Related work and background

A comprehensive review of the human action recognition approaches can be found in some interesting survey papers such as [1], [12], [13], [14]. In general, feature representations of video sequences can be divided into two categories: top-down (global) [15], [16], and bottom-up (local) [17], [18], [19] strategy representations. The global strategy first localizes region of the person in the video by background subtraction, and then represents the interest region as a whole. In this way, global

Human action recognition framework

In this section, each step of the proposed human action recognition framework is described in detail. The action representation is described in Section 3.1. The proposed ELS-TSVM classification algorithm is presented in Section 3.2. Finally, discussion on ELS-TSVM is done in 3.3. The framework of the proposed action recognition method has been illustrated in Fig. 1.

Experimental results

In this section ELS-TSVM has been employed to understand human actions. For this purpose, we have compared our ELS-TSVM method with other related methods on the Weizmann, KTH, and Hollywood action datasets. Figs. 2 and 4 provide some sample frames of action datasets. As shown in [29], the authors reported different accuracy rates up to 10.67% in results when different validation approaches have been applied to the same data. In our experiments, the leave-one-person-out cross-validation approach

Conclusion

In this paper, we have extended the LS-TSVM classifier to an energy based model called ELS-TSVM for human action recognition. The energy for each hyperplane (E(1),E(2)) in ELS-TSVM has been introduced to be flexible in the face of outliers of each actions. ELS-TSVM classifier performs classification by the use of two non-parallel hyperplanes unlike SVM which uses a single hyperplane. The proposed framework have addressed some pitfalls in previous action framework by SVM classifier such as

Acknowledgment

This research is partially supported by ITRC (Iran Telecommunication Research Center) under contract no. 6979/500.

References (41)

  • Z. Lu et al.

    Latent semantic learning with structured sparse representation for human action recognition

    Pattern Recognit.

    (2013)
  • J.A. Suykens et al.

    Weighted least squares support vector machinesrobustness and sparse approximation

    Neurocomputing

    (2002)
  • L. Chen, H. Wei, J. Ferryman, A survey of human motion analysis using depth imagery, Pattern Recognit....
  • C. Cortes et al.

    Support-vector networks

    Mach. Learn.

    (1995)
  • R. Khemchandani et al.

    Twin support vector machines for pattern classification

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2007)
  • M.A. Kumar et al.

    Least squares twin support vector machines for pattern classification

    Expert Syst. Appl.

    (2009)
  • J. Aggarwal et al.

    Human activity analysisa review

    ACM Comput. Surv. (CSUR)

    (2011)
  • Y. Wang, K. Huang, T. Tan, Human activity recognition based on r transform, in: IEEE Conference on Computer Vision and...
  • V. Kellokumpu, G. Zhao, M. Pietikäinen, Human activity recognition using a dynamic texture based method, in: BMVC,...
  • A. Ghodrati et al.

    Human action categorization using discriminative local spatio-temporal feature weighting

    Intell. Data Anal.

    (2012)
  • Cited by (0)

    View full text