Elsevier

Pattern Recognition

Volume 37, Issue 5, May 2004, Pages 1011-1024
Pattern Recognition

Gesture recognition using Bezier curves for visualization navigation from registered 3-D data

https://doi.org/10.1016/j.patcog.2003.11.007Get rights and content

Abstract

This paper presents a gesture recognition system for visualization navigation. Scientists are interested in developing interactive settings for exploring large data sets in an intuitive environment. The input consists of registered 3-D data. A geometric method using Bezier curves is used for the trajectory analysis and classification of gestures. The hand gesture speed is incorporated into the algorithm to enable correct recognition from trajectories with variations in hand speed. The method is robust and reliable: correct hand identification rate is 99.9% (from 1641 frames), modes of hand movements are correct 95.6% of the time, recognition rate (given the right mode) is 97.9%. An application to gesture-controlled visualization of 3D bioinformatics data is also presented.

Introduction

Large and complex data sets are produced at a more rapid pace than tools and algorithms for their processing, analysis, and exploration. For example, the National Institute of Health's (NIH) Visible Human project generated data sets of a single 3-D volume that consists of 12 billion elements. Nearly a terabyte of satellite data is produced on a daily basis. Advanced physics simulation at Lawrence Livermore National Laboratory (LLNL) is generating large data sets, which are expected to increase to one terabyte every 5min by 2004.

Among the tools used to help explore and understand large data, visualization aids in gaining greater insight into important physical parameters (such as temperature, height, stress, velocity or pressure) and in finding anomalies. Such anomalies often are not obvious to scientists during the automatic localization but are easily picked out with visual data exploration. Correct representation can drastically reduce the time needed for the analysis. As data becomes huge, it requires more time for processing. Even the latest supercomputers may require days or even weeks for computations. This makes real-time visualization mission-critical, since interesting properties can show up which allow scientists to adjust parameters of computation and restart it, if needed. Visualization is integrated into the process and is no longer just the last step.

State-of-the-art visualization displays keep pace with the data requirements. For instance, one of LLNL's “power walls” (Fig. 1(a)) is a 15-projector system that displays approximately 19.7 million pixels on a 16- by 8-ft screen. Systems like this allow for detailed data analysis and team collaboration. However, applying even simple commands to the data (such as zoom, rotation and translation) requires a secondary, or “background”, communication process between scientists working with the data (see the two standing by the screen in Fig. 1(a)) and the “operator” responsible for executing selected commands (sitting, left). This reduces productivity of the team and affects quality of presentations.

Therefore, scientists are interested in developing new, interactive settings for exploring their data in a more intuitive environment. A gesture-recognition system can interpret commands and supply data manipulation parameters to visualization software (Fig. 1(b)) without having an “operator” involved in the process.

Since the system is being developed as a front end for gesture-controlled, large-scale visualization and virtual reality manipulation, certain requirements and complications are apparent. First, 3-D information is required, not necessarily at a video-frame rate, but at least a few times per second (optimal parameters should be determined as a result of testing on a large group of people). Second, traditional techniques such as background subtraction cannot easily separate a figure from the background, since the entire body of the interacting person (not only arms or hands) is moving. Moreover, interaction takes place in front of the screen where the data is updated dynamically and, therefore, the background changes most of the time. Third, motion of the interacting person should be natural and should result in intuitive data manipulation, where intuitive means easily learned and fast to provide immediate results.

Gesture tracking and recognition are important research domains. Traditional approaches to tracking typically relied on segmentation of the intensity data, using motion or appearance data. A majority of the methods began by segmenting the human body from the background. For example, in “blob approaches”, people were modeled as a number of blobs resulting from pixel classification based on their color and position in the image. Wren et al. [1] achieved segmentation by classifying pixels into one of several models, including a static world and a dynamic user represented by Gaussian blobs. Yang and Ahuja [2] used skin color and the geometry of palm and face regions for segmentation stages of their system. A Gaussian mixture (with parameters estimated by an EM algorithm) modeled the distribution of skin-color pixels. Rehg and Kanade [3] used a 3-D hand model to track a hand. They compared line features from the images with the projected model and performed incremental state corrections. Similar work was presented by Kuch and Huang [4], in which the synthesis process could fit the hand model to any person's hand. Cutler and Davis [5] segmented the motion and computed a moving object's self-similarity (including human motion experiments).

A significant amount of work is being performed in the area of recognition, where Hidden Markov models (HMMs) are often employed successfully [6], [7], [8] by allowing researchers to address the highly stochastic nature of human gestures. Yacoob and Black proposed parameterized representation of human movement in the form of principal components [9]. Bobick and Wilson [10] treated gesture as a sequence of states and computed configuration states along prototype gestures. Yang and Ahuja [2] used motion trajectories for recognition. Grzeszcuk et al. [11] described classification algorithm with statistical moments of the binarized gesture templates. Hong et al. [12] treated each gesture as a finite state machine (FSM) in the spatial-temporal space; FSMs were trained using k-means clustering. A preliminary trained neural network was used by Sato et al. [13]. Hongo et al. [14] performed recognition by a linear discriminant analysis in each discriminant space by using four directional features. Approach described by Yoon et al. [15] derived features from location, angle, and velocity and employed a k-means clustering algorithm for the HMMs. Gesture contour representation and alignment-based classification were proposed by Gupta and Ma [16]. A review by Aggarwal and Cai [17] classified approaches to human motion analysis, the tasks involved, and major areas related to human motion interpretation. A review by Pavlovic et al. [18] addressed main components and directions in gesture recognition research for human-computer interaction (HCI).

Section snippets

Overview

In this section, we describe the method for recognizing three gesture types: rotation, zoom, and translation. Given the 3-D trajectory of the manipulating hand, we fit a Bezier curve to the trajectory. The curvature of the curve is used to determine the gesture.

Gesture recognition involves five steps:

  • 1.

    Detecting the manipulating hand.

  • 2.

    Identifying the beginning of the gesture.

  • 3.

    Detecting the end of the gesture.

  • 4.

    Computing the 3-D trajectory of the manipulating hand.

  • 5.

    Recognizing the gesture by fitting

Results

Experimental setup consists of a Digiclops system (Point Grey Research, [22]) on a Pentium 4 PC 1.5GHz with 512 MB RAM. The system is based on a triangulation between three cameras. Since the camera parameters (their relative positions, the focal length and resolution) are fixed, re-calibration is not usually required. The results are organized in four sections: manipulating hand detection; manipulation mode detection; gesture recognition; and overall performance. Testing data set includes 100

Conclusions

Visual data exploration has tremendous capabilities for revealing properties and abnormalities in large data sets. This paper described a gesture recognition system for visualization navigation. Scientists are interested in developing interactive settings for exploring large data sets in an intuitive environment. The input consists of registered 3-D data. Bezier curves are used for trajectory analysis and classification of gestures. The system improved upon previous work by emphasizing

Acknowledgements

We would like to thank the LLNL VIEWS Visualization project for Fig. 1(a); the example data set appears as Fig. 1(b) courtesy of Art Mirin of LLNL. Also, we thank Benjamin Lok at University of Florida for the Fig. 11.

About the Author—MIN C. SHIN received the B.S., M.S., and Ph.D. degrees in computer science from the University of South Florida, Tampa in 1992, 1996, and 2001, respectively. He received University of South Florida Graduate Council's Outstanding Dissertation Prize.

He is currently an Assistant Professor in the Department of Computer Science at the University of North Carolina at Charlotte. His research interests include gesture recognition, range image analysis, nonrigid motion analysis, and

References (22)

  • Y. Yacoob et al.

    Parameterized modeling and recognition of activities

    J. Comput. Vision Image Understand.

    (1999)
  • H.S. Yoon et al.

    Hand gesture recognition using combined features of location, angle and velocity

    Pattern Recognition

    (2001)
  • C. Wren et al.

    Pfinderreal-time tracking of the human body

    IEEE Trans. PAMI

    (1997)
  • M.-H. Yang, N. Ahuja, Recognizing hand gestures using motion trajectories, in: Proceedings of IEEE CS Conference on...
  • J.M. Rehg, T. Kanade, Visual tracking of high DOF articulated structures: an application to human hand tracking,...
  • J.J. Kuch, T.S. Huang, Model-based tracking of self-occluding articulated objects, in: Vision Based Hand Modeling and...
  • R. Cutler, L. Davis, Real-time periodic motion detection, analysis, and applications, in: Proceedings of IEEE CS...
  • D.J. Moore, I.A. Essa, M.H. Hayes III, Exploiting human actions and object context for recognition tasks, in:...
  • Y. Iwai, H. Shimizu, M. Yachida, Real-time context-based gesture recognition using hmm and automaton, in: Proceedings...
  • C. Vogler, H. Sun, D. Metaxas, A framework for motion recognition with applications to American sign language and gait...
  • A.F. Bobick et al.

    A state-based approach to the representation and recognition of gesture

    IEEE Trans. Pattern Anal. Machine Intel.

    (1997)
  • Cited by (52)

    • Detection of gestures without begin and end markers by fitting into Bézier curves with least squares method

      2017, Pattern Recognition Letters
      Citation Excerpt :

      The algorithm runs with constant memory space usage. It was already proposed to use curve fitting to recognize gestures [8] with markers from user when gesture begins and ends. This paper considers variation of the solution – describes a method of extracting predefined gestures from series of points without a marker, when the gesture begins or ends (Section 4.1) and presents derivation of formula for fitting the curve with implementation discussion.

    • Recent methods and databases in vision-based hand gesture recognition: A review

      2015, Computer Vision and Image Understanding
      Citation Excerpt :

      We hope this survey is timely, given the growing research efforts and expanding market for gestural interactive systems. The techniques used for dynamic HGR can be classified as (a) HMM [10–23] and other statistical methods [24–31], (b) ANN [32–34] and other learning based methods [35,36], (c) Eigenspace based methods [37,38], (d) Curve fitting [39], and (e) Dynamic programming [40]/Dynamic time warping [41,42] (Fig. 3). HMM is the most widely used HGR technique.

    • Integral invariants for space motion trajectory matching and recognition

      2015, Pattern Recognition
      Citation Excerpt :

      However the relative changes with respect to neighboring lines limited the use of chain code for complex space curves. Using algebraic curve, such as B-spline [24] and Bezier curve [12], a shape contour can be approximated through some key control points. These curve fitting methods show non-uniqueness when the sampling rate of motion trajectories varies or partial occlusions exist in a trajectory, because their approximation accuracies depend on those key control points.

    View all citing articles on Scopus

    About the Author—MIN C. SHIN received the B.S., M.S., and Ph.D. degrees in computer science from the University of South Florida, Tampa in 1992, 1996, and 2001, respectively. He received University of South Florida Graduate Council's Outstanding Dissertation Prize.

    He is currently an Assistant Professor in the Department of Computer Science at the University of North Carolina at Charlotte. His research interests include gesture recognition, range image analysis, nonrigid motion analysis, and performance evaluation. Dr. Shin is a member of IEEE, UPE, and the Golden Key Honor Society. More information can be obtained from http://www.cs.uncc.edu/~mcshin.

    About the Author—LEONID V. TSAP received the B.S. degree in Computer Science from the Kiev Civil Engineering Institute, Ukraine, in 1991, and the M.S. and Ph.D. degrees in Computer Science from the University of South Florida, Tampa, in 1995 and 1999, respectively. He is a three-time winner of the annual University of South Florida USPS Scholarship Award, and a recipient of the Provost's Commendation for Outstanding Teaching by a Graduate Student. He also received University of South Florida Graduate Council's Outstanding Dissertation Prize. He is currently with the Advanced Communications and Signal Processing Group (Electronics Engineering Department) at the University of California Lawrence Livermore National Laboratory.

    Leonid V. Tsap is a member of the IEEE-CS and ACM. He is a member of the Editorial Board of the Pattern Recognition journal. His current research interests include image analysis/computer vision, nonrigid motion analysis, pattern recognition, perceptual user interfaces, physically-based modeling and biocomputing. His research resulted in 24 refereed publications. More information can be obtained from http://marathon.csee.usf.edu/~tsap and http://www.llnl.gov/CASC/people/tsap.

    About the Author—DMITRY B. GOLDGOF has received the Ph.D. degree in Electrical Engineering from the University of Illinois at Urbana-Champaign, in 1989. He is currently a Professor in the Department of Computer Science and Engineering at the University of South Florida in Tampa and a member of H. Lee Moffitt Cancer Center and Research Institute. Professor Goldgof research interests include motion and deformation analysis of biological objects, motion analysis, computer vision, image processing and its biomedical applications, bioinformatics and pattern recognition. He has graduated 10 Ph.D and 24 M.S. students, and has published 50 journal and over 100 conference publications, 15 book chapters and 4 books.

    Professor Goldgof was awarded Annual Pattern Recognition Society Awards (for best papers) in 1993 and 2002. His paper entitled “Automatic tumor segmentation using knowledge-based techniques” was selected by International Medical Informatics Association for 2000 IMIA Yearbook containing “the best of medical informatics”. Professor Goldgof is a senior member of IEEE. He is the North American Editor for Image and Vision Computing Journal and Associate Editor for IEEE Transactions on Systems, Man and Cybernetics, Part B. Dr. Goldgof has served as a member of the Editorial Board of the Pattern Recognition (1990–2001), a member of International Association of Pattern Recognition (IAPR) Education Committee (2000–2002), and as Associate Editor for IEEE Transactions on Image Processing (1996–1998). More information can be obtained from http://marathon.csee.usf.edu/~goldgof/.

    This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract number W-7405-Eng- 48. UCRL-JC-152416.

    View full text