Gesture recognition using Bezier curves for visualization navigation from registered 3-D data

doi:10.1016/j.patcog.2003.11.007

Pattern Recognition

Volume 37, Issue 5, May 2004, Pages 1011-1024

https://doi.org/10.1016/j.patcog.2003.11.007 Get rights and content

Abstract

This paper presents a gesture recognition system for visualization navigation. Scientists are interested in developing interactive settings for exploring large data sets in an intuitive environment. The input consists of registered 3-D data. A geometric method using Bezier curves is used for the trajectory analysis and classification of gestures. The hand gesture speed is incorporated into the algorithm to enable correct recognition from trajectories with variations in hand speed. The method is robust and reliable: correct hand identification rate is 99.9% (from 1641 frames), modes of hand movements are correct 95.6% of the time, recognition rate (given the right mode) is 97.9%. An application to gesture-controlled visualization of 3D bioinformatics data is also presented.

Introduction

Large and complex data sets are produced at a more rapid pace than tools and algorithms for their processing, analysis, and exploration. For example, the National Institute of Health's (NIH) Visible Human project generated data sets of a single 3-D volume that consists of 12 billion elements. Nearly a terabyte of satellite data is produced on a daily basis. Advanced physics simulation at Lawrence Livermore National Laboratory (LLNL) is generating large data sets, which are expected to increase to one terabyte every $5 min$ by 2004.

Among the tools used to help explore and understand large data, visualization aids in gaining greater insight into important physical parameters (such as temperature, height, stress, velocity or pressure) and in finding anomalies. Such anomalies often are not obvious to scientists during the automatic localization but are easily picked out with visual data exploration. Correct representation can drastically reduce the time needed for the analysis. As data becomes huge, it requires more time for processing. Even the latest supercomputers may require days or even weeks for computations. This makes real-time visualization mission-critical, since interesting properties can show up which allow scientists to adjust parameters of computation and restart it, if needed. Visualization is integrated into the process and is no longer just the last step.

State-of-the-art visualization displays keep pace with the data requirements. For instance, one of LLNL's “power walls” (Fig. 1(a)) is a 15-projector system that displays approximately 19.7 million pixels on a 16- by 8-ft screen. Systems like this allow for detailed data analysis and team collaboration. However, applying even simple commands to the data (such as zoom, rotation and translation) requires a secondary, or “background”, communication process between scientists working with the data (see the two standing by the screen in Fig. 1(a)) and the “operator” responsible for executing selected commands (sitting, left). This reduces productivity of the team and affects quality of presentations.

Therefore, scientists are interested in developing new, interactive settings for exploring their data in a more intuitive environment. A gesture-recognition system can interpret commands and supply data manipulation parameters to visualization software (Fig. 1(b)) without having an “operator” involved in the process.

Since the system is being developed as a front end for gesture-controlled, large-scale visualization and virtual reality manipulation, certain requirements and complications are apparent. First, 3-D information is required, not necessarily at a video-frame rate, but at least a few times per second (optimal parameters should be determined as a result of testing on a large group of people). Second, traditional techniques such as background subtraction cannot easily separate a figure from the background, since the entire body of the interacting person (not only arms or hands) is moving. Moreover, interaction takes place in front of the screen where the data is updated dynamically and, therefore, the background changes most of the time. Third, motion of the interacting person should be natural and should result in intuitive data manipulation, where intuitive means easily learned and fast to provide immediate results.

Gesture tracking and recognition are important research domains. Traditional approaches to tracking typically relied on segmentation of the intensity data, using motion or appearance data. A majority of the methods began by segmenting the human body from the background. For example, in “blob approaches”, people were modeled as a number of blobs resulting from pixel classification based on their color and position in the image. Wren et al. [1] achieved segmentation by classifying pixels into one of several models, including a static world and a dynamic user represented by Gaussian blobs. Yang and Ahuja [2] used skin color and the geometry of palm and face regions for segmentation stages of their system. A Gaussian mixture (with parameters estimated by an EM algorithm) modeled the distribution of skin-color pixels. Rehg and Kanade [3] used a 3-D hand model to track a hand. They compared line features from the images with the projected model and performed incremental state corrections. Similar work was presented by Kuch and Huang [4], in which the synthesis process could fit the hand model to any person's hand. Cutler and Davis [5] segmented the motion and computed a moving object's self-similarity (including human motion experiments).

A significant amount of work is being performed in the area of recognition, where Hidden Markov models (HMMs) are often employed successfully [6], [7], [8] by allowing researchers to address the highly stochastic nature of human gestures. Yacoob and Black proposed parameterized representation of human movement in the form of principal components [9]. Bobick and Wilson [10] treated gesture as a sequence of states and computed configuration states along prototype gestures. Yang and Ahuja [2] used motion trajectories for recognition. Grzeszcuk et al. [11] described classification algorithm with statistical moments of the binarized gesture templates. Hong et al. [12] treated each gesture as a finite state machine (FSM) in the spatial-temporal space; FSMs were trained using k-means clustering. A preliminary trained neural network was used by Sato et al. [13]. Hongo et al. [14] performed recognition by a linear discriminant analysis in each discriminant space by using four directional features. Approach described by Yoon et al. [15] derived features from location, angle, and velocity and employed a k-means clustering algorithm for the HMMs. Gesture contour representation and alignment-based classification were proposed by Gupta and Ma [16]. A review by Aggarwal and Cai [17] classified approaches to human motion analysis, the tasks involved, and major areas related to human motion interpretation. A review by Pavlovic et al. [18] addressed main components and directions in gesture recognition research for human-computer interaction (HCI).

Section snippets

Overview

In this section, we describe the method for recognizing three gesture types: rotation, zoom, and translation. Given the 3-D trajectory of the manipulating hand, we fit a Bezier curve to the trajectory. The curvature of the curve is used to determine the gesture.

Gesture recognition involves five steps:

1.
Detecting the manipulating hand.
2.
Identifying the beginning of the gesture.
3.
Detecting the end of the gesture.
4.
Computing the 3-D trajectory of the manipulating hand.
5.
Recognizing the gesture by fitting

Results

Experimental setup consists of a Digiclops system (Point Grey Research, [22]) on a Pentium 4 PC $1.5 GHz$ with 512 MB RAM. The system is based on a triangulation between three cameras. Since the camera parameters (their relative positions, the focal length and resolution) are fixed, re-calibration is not usually required. The results are organized in four sections: manipulating hand detection; manipulation mode detection; gesture recognition; and overall performance. Testing data set includes 100

Conclusions

Visual data exploration has tremendous capabilities for revealing properties and abnormalities in large data sets. This paper described a gesture recognition system for visualization navigation. Scientists are interested in developing interactive settings for exploring large data sets in an intuitive environment. The input consists of registered 3-D data. Bezier curves are used for trajectory analysis and classification of gestures. The system improved upon previous work by emphasizing

Acknowledgements

We would like to thank the LLNL VIEWS Visualization project for Fig. 1(a); the example data set appears as Fig. 1(b) courtesy of Art Mirin of LLNL. Also, we thank Benjamin Lok at University of Florida for the Fig. 11.

About the Author—MIN C. SHIN received the B.S., M.S., and Ph.D. degrees in computer science from the University of South Florida, Tampa in 1992, 1996, and 2001, respectively. He received University of South Florida Graduate Council's Outstanding Dissertation Prize.

References (22)

Y. Yacoob et al.
Parameterized modeling and recognition of activities
J. Comput. Vision Image Understand.
(1999)
H.S. Yoon et al.
Hand gesture recognition using combined features of location, angle and velocity
Pattern Recognition
(2001)
C. Wren et al.
Pfinderreal-time tracking of the human body
IEEE Trans. PAMI
(1997)
M.-H. Yang, N. Ahuja, Recognizing hand gestures using motion trajectories, in: Proceedings of IEEE CS Conference on...
J.M. Rehg, T. Kanade, Visual tracking of high DOF articulated structures: an application to human hand tracking,...
J.J. Kuch, T.S. Huang, Model-based tracking of self-occluding articulated objects, in: Vision Based Hand Modeling and...
R. Cutler, L. Davis, Real-time periodic motion detection, analysis, and applications, in: Proceedings of IEEE CS...
D.J. Moore, I.A. Essa, M.H. Hayes III, Exploiting human actions and object context for recognition tasks, in:...
Y. Iwai, H. Shimizu, M. Yachida, Real-time context-based gesture recognition using hmm and automaton, in: Proceedings...
C. Vogler, H. Sun, D. Metaxas, A framework for motion recognition with applications to American sign language and gait...

A.F. Bobick et al.

A state-based approach to the representation and recognition of gesture

IEEE Trans. Pattern Anal. Machine Intel.

(1997)

Cited by (52)

FM: Flexible mapping from one gesture to multiple semantics
2018, Information Sciences
An algorithm for flexible mapping (FM) from one gesture to multiple semantics in the same situational context is presented for the first time to reduce the cognitive and operating loads of an operator by using gesture commands. First, the foundation of the FM is built with the semantic-oriented difference features of behavior model (SDFBM) of the operator as implicit inputs. Second, the gesture–semantics FM algorithm and its extended process, namely, attribute classification for FM are proposed and implemented. Third, five commonly used gestures are designed to demonstrate the way a gesture is mapped to multiple semantics. Finally, several comparative experimental results are provided to demonstrate the superiority of the proposed methods to the state of the art. The main innovation of this study is that the same gesture in the same context can be mapped to several different semantics through the SDFBM feature recognition. This study provides an intelligent and natural interaction interface model for 3D platforms and key support for gesture-based implicit interaction design. The proposed algorithms are also tested or used in several applications, such as smart teaching interface and onboard vehicular systems and intelligent TV.
Detection of gestures without begin and end markers by fitting into Bézier curves with least squares method
2017, Pattern Recognition Letters
Citation Excerpt :
The algorithm runs with constant memory space usage. It was already proposed to use curve fitting to recognize gestures [8] with markers from user when gesture begins and ends. This paper considers variation of the solution – describes a method of extracting predefined gestures from series of points without a marker, when the gesture begins or ends (Section 4.1) and presents derivation of formula for fitting the curve with implementation discussion.
Gesture recognition is used to obtain natural way of device control. This paper describes a method of extracting predefined gestures from series of points without a marker, when the gesture begins or ends. Presents a way to fit points to Bézier curve with constant memory usage. The procedure is based on least–squares method. Concludes with usage of algorithm to detect loops in series of points.
Recent methods and databases in vision-based hand gesture recognition: A review
2015, Computer Vision and Image Understanding
Citation Excerpt :
We hope this survey is timely, given the growing research efforts and expanding market for gestural interactive systems. The techniques used for dynamic HGR can be classified as (a) HMM [10–23] and other statistical methods [24–31], (b) ANN [32–34] and other learning based methods [35,36], (c) Eigenspace based methods [37,38], (d) Curve fitting [39], and (e) Dynamic programming [40]/Dynamic time warping [41,42] (Fig. 3). HMM is the most widely used HGR technique.
Successful efforts in hand gesture recognition research within the last two decades paved the path for natural human–computer interaction systems. Unresolved challenges such as reliable identification of gesturing phase, sensitivity to size, shape, and speed variations, and issues due to occlusion keep hand gesture recognition research still very active. We provide a review of vision-based hand gesture recognition algorithms reported in the last 16 years. The methods using RGB and RGB-D cameras are reviewed with quantitative and qualitative comparisons of algorithms. Quantitative comparison of algorithms is done using a set of 13 measures chosen from different attributes of the algorithm and the experimental methodology adopted in algorithm evaluation. We point out the need for considering these measures together with the recognition accuracy of the algorithm to predict its success in real-world applications. The paper also reviews 26 publicly available hand gesture databases and provides the web-links for their download.
Integral invariants for space motion trajectory matching and recognition
2015, Pattern Recognition
Citation Excerpt :
However the relative changes with respect to neighboring lines limited the use of chain code for complex space curves. Using algebraic curve, such as B-spline [24] and Bezier curve [12], a shape contour can be approximated through some key control points. These curve fitting methods show non-uniqueness when the sampling rate of motion trajectories varies or partial occlusions exist in a trajectory, because their approximation accuracies depend on those key control points.
Motion trajectories provide a key and informative clue in motion characterization of humans, robots and moving objects. In this paper, we propose some new integral invariants for space motion trajectories, which benefit effective motion trajectory matching and recognition. Integral invariants are defined as the line integrals of a class of kernel functions along a motion trajectory. A robust estimation of the integral invariants is formulated based on the blurred segment of noisy discrete curve. Then a non-linear distance of the integral invariants is defined to measure the similarity for trajectory matching and recognition. Such integral invariants, in addition to being invariant to transformation groups, have some desirable properties such as noise insensitivity, computational locality, and uniqueness of representation. Experimental results on trajectory matching and sign recognition show the effectiveness and robustness of the proposed integral invariants in motion trajectory matching and recognition.
Robust gesture recognition using Kinect: A comparison between DTW and HMM
2015, Optik
Human computer interaction through hand gestures is one of the most intuitive ways of communicating with machines and thus it is no surprise that the field of real time gesture detection has seen significant interest among the scientific community in recent times. In this paper a hand gesture recognition method using the Microsoft Kinect has been proposed, which operates robustly in uncontrolled environments and is insensitive to hand variations and distortions. This demonstrates the use of two different learning techniques, dynamic time warping and hidden Markov model and compare them for real-time implementations. The recognition success rate was over 90%. The relative advantages of both techniques have been discussed with constraints.
Exploring 4D Image Sets of Early Heart Development Using Gesture and an Immersive, Spatial Operating Environment
2022, Leonardo

View all citing articles on Scopus

He is currently an Assistant Professor in the Department of Computer Science at the University of North Carolina at Charlotte. His research interests include gesture recognition, range image analysis, nonrigid motion analysis, and performance evaluation. Dr. Shin is a member of IEEE, UPE, and the Golden Key Honor Society. More information can be obtained from http://www.cs.uncc.edu/~mcshin.

About the Author—LEONID V. TSAP received the B.S. degree in Computer Science from the Kiev Civil Engineering Institute, Ukraine, in 1991, and the M.S. and Ph.D. degrees in Computer Science from the University of South Florida, Tampa, in 1995 and 1999, respectively. He is a three-time winner of the annual University of South Florida USPS Scholarship Award, and a recipient of the Provost's Commendation for Outstanding Teaching by a Graduate Student. He also received University of South Florida Graduate Council's Outstanding Dissertation Prize. He is currently with the Advanced Communications and Signal Processing Group (Electronics Engineering Department) at the University of California Lawrence Livermore National Laboratory.

Leonid V. Tsap is a member of the IEEE-CS and ACM. He is a member of the Editorial Board of the Pattern Recognition journal. His current research interests include image analysis/computer vision, nonrigid motion analysis, pattern recognition, perceptual user interfaces, physically-based modeling and biocomputing. His research resulted in 24 refereed publications. More information can be obtained from http://marathon.csee.usf.edu/~tsap and http://www.llnl.gov/CASC/people/tsap.

About the Author—DMITRY B. GOLDGOF has received the Ph.D. degree in Electrical Engineering from the University of Illinois at Urbana-Champaign, in 1989. He is currently a Professor in the Department of Computer Science and Engineering at the University of South Florida in Tampa and a member of H. Lee Moffitt Cancer Center and Research Institute. Professor Goldgof research interests include motion and deformation analysis of biological objects, motion analysis, computer vision, image processing and its biomedical applications, bioinformatics and pattern recognition. He has graduated 10 Ph.D and 24 M.S. students, and has published 50 journal and over 100 conference publications, 15 book chapters and 4 books.

Professor Goldgof was awarded Annual Pattern Recognition Society Awards (for best papers) in 1993 and 2002. His paper entitled “Automatic tumor segmentation using knowledge-based techniques” was selected by International Medical Informatics Association for 2000 IMIA Yearbook containing “the best of medical informatics”. Professor Goldgof is a senior member of IEEE. He is the North American Editor for Image and Vision Computing Journal and Associate Editor for IEEE Transactions on Systems, Man and Cybernetics, Part B. Dr. Goldgof has served as a member of the Editorial Board of the Pattern Recognition (1990–2001), a member of International Association of Pattern Recognition (IAPR) Education Committee (2000–2002), and as Associate Editor for IEEE Transactions on Image Processing (1996–1998). More information can be obtained from http://marathon.csee.usf.edu/~goldgof/.

^☆: This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract number W-7405-Eng- 48. UCRL-JC-152416.

View full text

Gesture recognition using Bezier curves for visualization navigation from registered 3-D data☆