research-article

A Real-Time Hand Posture Recognition System Using Deep Neural Networks

Authors:
Ao Tang

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China
View Profile

,
Ke Lu

University of the Chinese Academy of Sciences, Beijing, China

University of the Chinese Academy of Sciences, Beijing, China
View Profile

,
Yufei Wang

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China
View Profile

,
Jie Huang

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China
View Profile

,
Houqiang Li

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China
View Profile

ACM Transactions on Intelligent Systems and Technology Volume 6 Issue 2Article No.: 21pp 1–23https://doi.org/10.1145/2735952

Published:31 March 2015Publication History

ACM Transactions on Intelligent Systems and Technology

Abstract

Hand posture recognition (HPR) is quite a challenging task, due to both the difficulty in detecting and tracking hands with normal cameras and the limitations of traditional manually selected features. In this article, we propose a two-stage HPR system for Sign Language Recognition using a Kinect sensor. In the first stage, we propose an effective algorithm to implement hand detection and tracking. The algorithm incorporates both color and depth information, without specific requirements on uniform-colored or stable background. It can handle the situations in which hands are very close to other parts of the body or hands are not the nearest objects to the camera and allows for occlusion of hands caused by faces or other hands. In the second stage, we apply deep neural networks (DNNs) to automatically learn features from hand posture images that are insensitive to movement, scaling, and rotation. Experiments verify that the proposed system works quickly and accurately and achieves a recognition accuracy as high as 98.12%.

References

Alper Aksaç, Orkun Öztürk, and Tansel Özyer. 2011. Real-time multi-objective hand posture/gesture recognition by using distance classifiers and finite state machine for virtual mouse operations. In Proceedings of the 2011 7th International Conference on Electrical and Electronics Engineering (ELECO’11). IEEE, II--457.Google Scholar
Antonis A. Argyros and Manolis I. A. Lourakis. 2004. Real-time tracking of multiple skin-colored objects with a possibly moving camera. In Proceedings of the European Conference on Computer Vision (ECCV’04). Springer, 368--379.Google Scholar
Chuqing Cao and Ruifeng Li. 2010. Real-time hand posture recognition using Haar-like and topological feature. In Proceedings of the 2010 International Conference on Machine Vision and Human-Machine Interface (MVHI’10). IEEE, 683--687. Google ScholarDigital Library
Manuel Caputo, Klaus Denker, Benjamin Dums, and Georg Umlauf. 2012. 3D hand gesture recognition based on sensor fusion of commodity hardware. In Mensch & Computer 2012: interaktiv informiert--allgegenwäärtig und allumfassend&excl;&quest;Google Scholar
Douglas Chai and King N. Ngan. 1999. Face segmentation using skin-color map in videophone applications. IEEE Transactions on Circuits and Systems for Video Technology 9, 4 (1999), 551--564. Google ScholarDigital Library
Feng-Sheng Chen, Chih-Ming Fu, and Chung-Lin Huang. 2003. Hand gesture recognition using a real-time tracking method and hidden Markov models. Image and Vision Computing 21, 8 (2003), 745--758.Google ScholarCross Ref
George E. Dahl, Dong Yu, Li Deng, and Alex Acero. 2012. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 20, 1 (2012), 30--42. Google ScholarDigital Library
Marco Fagiani, Emanuele Principi, Stefano Squartini, and Francesco Piazza. 2013. A new system for automatic recognition of italian sign language. In Neural Nets and Surroundings. Springer, 69--79.Google Scholar
Gian Luca Foresti. 1999. Object recognition and tracking for remote video surveillance. IEEE Transactions on Circuits and Systems for Video Technology 9, 7 (1999), 1045--1062. Google ScholarDigital Library
Wen Gao, Gaolin Fang, Debin Zhao, and Yiqiang Chen. 2004. A Chinese sign language recognition system based on SOFM/SRN/HMM. Pattern Recognition 37, 12 (2004), 2389--2402. Google ScholarDigital Library
Geoffrey E. Hinton. 2002. Training products of experts by minimizing contrastive divergence. Neural Computation 14, 8 (2002), 1771--1800. Google ScholarDigital Library
Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A fast learning algorithm for deep belief nets. Neural Computation 18, 7 (2006), 1527--1554. Google ScholarDigital Library
Geoffrey E. Hinton and Ruslan R. Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504--507.Google Scholar
Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin. 2003. A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University. July, 2003.Google Scholar
Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 2013. 3D convolutional neural networks for human action recognition. (2013).Google Scholar
Alex Krizhevsky and Geoffrey E. Hinton. 2011. Using very deep autoencoders for content-based image retrieval. In Proceeding of the European Symposium on Artificial Neural Networks (ESANN’11).Google Scholar
A. Kurakin, Z. Zhang, and Z. Liu. 2012. A real time system for dynamic hand gesture recognition with a depth sensor. In Proceedings of the 20th European Signal Processing Conference (EUSIPCO’12). IEEE, 1975--1979.Google Scholar
Yann LeCun. 1989. Generalization and network design strategies. Connectionism in Perspective (1989), 143--155.Google Scholar
Yann LeCun and Yoshua Bengio. 1995. Convolutional networks for images, speech, and time series. The Handbook of Brain Theory and Neural Networks 3361, 310 (1995). Google ScholarDigital Library
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278--2324.Google ScholarCross Ref
Billy Y. L. Li, Ajmal S. Mian, Wanquan Liu, and Aneesh Krishna. 2013. Using Kinect for face recognition under varying poses, expressions, illumination and disguise. In Proceeding of the 2013 IEEE Workshop on Applications of Computer Vision (WACV). IEEE, 186--192. Google ScholarDigital Library
Yi Li. 2012. Hand gesture recognition using Kinect. In Proceedings of the 2012 IEEE 3rd International Conference on Software Engineering and Service Science (ICSESS). IEEE, 196--199.Google ScholarCross Ref
Zhi Li and Ray Jarvis. 2009. Real time hand gesture recognition using a range camera. In Proceedings of the Australasian Conference on Robotics and Automation. 21--27.Google Scholar
Li Liu and Ling Shao. 2013. Learning discriminative representations from RGB-D video data. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). Google ScholarDigital Library
S. Malassiotis and M. G. Strintzis. 2008. Real-time hand posture recognition using range data. Image and Vision Computing 26, 7 (2008), 1027--1037. Google ScholarDigital Library
Vinod Nair and Geoffrey Hinton. 2009. 3-d object recognition with deep belief nets. Advances in Neural Information Processing Systems 22 (2009), 1339--1347.Google Scholar
C. Nebauer. 1998. Evaluation of convolutional neural networks for visual recognition. IEEE Transactions on Neural Networks 9, 4 (1998), 685--696. Google ScholarDigital Library
V. Radha and M. Krishnaveni. 2009. Threshold based segmentation using median filter for sign language recognition system. In Proceedings of the World Congress on Nature & Biologically Inspired Computing, 2009 (NaBIC’’09). IEEE, 1394--1399.Google Scholar
Zhou Ren, Junsong Yuan, and Zhengyou Zhang. 2011. Robust hand gesture recognition based on finger-earth mover’s distance with a commodity depth camera. In Proceedings of the 19th ACM International Conference on Multimedia. ACM, 1093--1096. Google ScholarDigital Library
Ruslan Salakhutdinov and Geoffrey E. Hinton. 2008. Using deep belief nets to learn covariance kernels for Gaussian processes. Advances in Neural Information Processing Systems (2008), 1249--1256.Google Scholar
Frank Seide, Gang Li, and Dong Yu. 2011. Conversational speech transcription using context-dependent deep neural networks. In Proceedings of Interspeech. 437--440.Google Scholar
Poonam Suryanarayan, Anbumani Subramanian, and Dinesh Mandalapu. 2010. Dynamic hand pose recognition using depth data. In Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR’10). IEEE, 3105--3108. Google ScholarDigital Library
Satoshi Suzuki. 1985. Topological structural analysis of digitized binary images by border following. Computer Vision, Graphics, and Image Processing 30, 1 (1985), 32--46.Google ScholarCross Ref
Balazs Tusor and A. R. Varkonyi-Koczy. 2010. Circular fuzzy neural network based hand gesture and posture modeling. In Proceedings of the 2010 IEEE Instrumentation and Measurement Technology Conference (I2MTC’10). IEEE, 815--820.Google Scholar
Michael Van den Bergh and Luc Van Gool. 2011. Combining RGB and ToF cameras for real-time 3D hand gesture interaction. In Proceedings of the 2011 IEEE Workshop on Applications of Computer Vision (WACV’11). IEEE, 66--72. Google ScholarDigital Library
Jiang Wang, Zicheng Liu, Jan Chorowski, Zhuoyuan Chen, and Ying Wu. 2012. Robust 3D action recognition with random occupancy patterns. In Proceedings of the European Conference on Computer Vision (ECCV’12). Springer, 872--885. Google ScholarDigital Library
Yue Gao, Meng Wang, Dacheng Tao, Rongrong Ji, and Qh Dai. 2012. 3D object retrieval and recognition with hypergraph analysis. IEEE Transactions on Image Processing 21, 9 (2012), 4290--4303. Google ScholarDigital Library
Meng Wang, Xian-Sheng Hua, Tao Mei, Richang Hong, Guojun Qi, Yan Song, and Li-Rong Dai. 2009. Semi-supervised kernel density estimation for video annotation. Computer Vision and Image Understanding 113, 3 (2009), 384--396. Google ScholarDigital Library
Lu Xia, Chia-Chih Chen, and J. K. Aggarwal. 2012. View invariant human action recognition using histograms of 3D joints. In Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’12). IEEE, 20--27.Google Scholar
X. Zabulis, H. Baltzakis, and A. Argyros. 2009. Vision-based hand gesture recognition for human-computer interaction. The Universal Access Handbook. LEA (2009).Google Scholar

Index Terms

A Real-Time Hand Posture Recognition System Using Deep Neural Networks
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Tracking
      2. Computer vision tasks
        Scene understanding

Recommendations

A real time vision-based hand gestures recognition system
ISICA'10: Proceedings of the 5th international conference on Advances in computation and intelligence

Hand gesture recognition is an important aspect in Human-Computer interaction, and can be used in various applications, such as virtual reality and computer games. In this paper, we propose a real time hand gesture recognition system. It includes three ...
Read More
Real-time Hand Tracking Using Kinect
ICDSP '18: Proceedings of the 2nd International Conference on Digital Signal Processing

Real-time hand tracking is fundamental to human gesture recognition. However, due to the huge computation, previous studies are either off-line or limited to given poses. In order to satisfy the requirement of real-time hand tracking, in this paper we ...
Read More
Robust hand posture recognition integrating multi-cue hand tracking
Edutainment'10: Proceedings of the Entertainment for education, and 5th international conference on E-learning and games

This paper proposes a robust real-time method for hand tracking and hand posture recognition. Dealing with complex background, scale-invariance and rotation-invariance are the difficulties for hand posture recognition. To solve these difficulties, we ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Intelligent Systems and Technology Volume 6, Issue 2
Special Section on Visual Understanding with RGB-D Sensors
May 2015
381 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/2753829
Editor:
Huan Liu
Arizona State University
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 March 2015
- Accepted: 1 January 2014
- Revised: 1 November 2013
- Received: 1 July 2013
Published in tist Volume 6, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Kinect
deep neural networks
hand tracking
posture recognition
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 92
  Total Citations
  View Citations
- 1,306
  Total Downloads
- Downloads (Last 12 months)82
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Real-Time Hand Posture Recognition System Using Deep Neural Networks

ACM Transactions on Intelligent Systems and Technology

Abstract

References

Cited By

Index Terms

Recommendations

A real time vision-based hand gestures recognition system

Real-time Hand Tracking Using Kinect

Robust hand posture recognition integrating multi-cue hand tracking