Abstract
Hand posture recognition (HPR) is quite a challenging task, due to both the difficulty in detecting and tracking hands with normal cameras and the limitations of traditional manually selected features. In this article, we propose a two-stage HPR system for Sign Language Recognition using a Kinect sensor. In the first stage, we propose an effective algorithm to implement hand detection and tracking. The algorithm incorporates both color and depth information, without specific requirements on uniform-colored or stable background. It can handle the situations in which hands are very close to other parts of the body or hands are not the nearest objects to the camera and allows for occlusion of hands caused by faces or other hands. In the second stage, we apply deep neural networks (DNNs) to automatically learn features from hand posture images that are insensitive to movement, scaling, and rotation. Experiments verify that the proposed system works quickly and accurately and achieves a recognition accuracy as high as 98.12%.
- Alper Aksaç, Orkun Öztürk, and Tansel Özyer. 2011. Real-time multi-objective hand posture/gesture recognition by using distance classifiers and finite state machine for virtual mouse operations. In Proceedings of the 2011 7th International Conference on Electrical and Electronics Engineering (ELECO’11). IEEE, II--457.Google Scholar
- Antonis A. Argyros and Manolis I. A. Lourakis. 2004. Real-time tracking of multiple skin-colored objects with a possibly moving camera. In Proceedings of the European Conference on Computer Vision (ECCV’04). Springer, 368--379.Google Scholar
- Chuqing Cao and Ruifeng Li. 2010. Real-time hand posture recognition using Haar-like and topological feature. In Proceedings of the 2010 International Conference on Machine Vision and Human-Machine Interface (MVHI’10). IEEE, 683--687. Google ScholarDigital Library
- Manuel Caputo, Klaus Denker, Benjamin Dums, and Georg Umlauf. 2012. 3D hand gesture recognition based on sensor fusion of commodity hardware. In Mensch & Computer 2012: interaktiv informiert--allgegenwäärtig und allumfassend!?Google Scholar
- Douglas Chai and King N. Ngan. 1999. Face segmentation using skin-color map in videophone applications. IEEE Transactions on Circuits and Systems for Video Technology 9, 4 (1999), 551--564. Google ScholarDigital Library
- Feng-Sheng Chen, Chih-Ming Fu, and Chung-Lin Huang. 2003. Hand gesture recognition using a real-time tracking method and hidden Markov models. Image and Vision Computing 21, 8 (2003), 745--758.Google ScholarCross Ref
- George E. Dahl, Dong Yu, Li Deng, and Alex Acero. 2012. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 20, 1 (2012), 30--42. Google ScholarDigital Library
- Marco Fagiani, Emanuele Principi, Stefano Squartini, and Francesco Piazza. 2013. A new system for automatic recognition of italian sign language. In Neural Nets and Surroundings. Springer, 69--79.Google Scholar
- Gian Luca Foresti. 1999. Object recognition and tracking for remote video surveillance. IEEE Transactions on Circuits and Systems for Video Technology 9, 7 (1999), 1045--1062. Google ScholarDigital Library
- Wen Gao, Gaolin Fang, Debin Zhao, and Yiqiang Chen. 2004. A Chinese sign language recognition system based on SOFM/SRN/HMM. Pattern Recognition 37, 12 (2004), 2389--2402. Google ScholarDigital Library
- Geoffrey E. Hinton. 2002. Training products of experts by minimizing contrastive divergence. Neural Computation 14, 8 (2002), 1771--1800. Google ScholarDigital Library
- Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A fast learning algorithm for deep belief nets. Neural Computation 18, 7 (2006), 1527--1554. Google ScholarDigital Library
- Geoffrey E. Hinton and Ruslan R. Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504--507.Google Scholar
- Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin. 2003. A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University. July, 2003.Google Scholar
- Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 2013. 3D convolutional neural networks for human action recognition. (2013).Google Scholar
- Alex Krizhevsky and Geoffrey E. Hinton. 2011. Using very deep autoencoders for content-based image retrieval. In Proceeding of the European Symposium on Artificial Neural Networks (ESANN’11).Google Scholar
- A. Kurakin, Z. Zhang, and Z. Liu. 2012. A real time system for dynamic hand gesture recognition with a depth sensor. In Proceedings of the 20th European Signal Processing Conference (EUSIPCO’12). IEEE, 1975--1979.Google Scholar
- Yann LeCun. 1989. Generalization and network design strategies. Connectionism in Perspective (1989), 143--155.Google Scholar
- Yann LeCun and Yoshua Bengio. 1995. Convolutional networks for images, speech, and time series. The Handbook of Brain Theory and Neural Networks 3361, 310 (1995). Google ScholarDigital Library
- Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278--2324.Google ScholarCross Ref
- Billy Y. L. Li, Ajmal S. Mian, Wanquan Liu, and Aneesh Krishna. 2013. Using Kinect for face recognition under varying poses, expressions, illumination and disguise. In Proceeding of the 2013 IEEE Workshop on Applications of Computer Vision (WACV). IEEE, 186--192. Google ScholarDigital Library
- Yi Li. 2012. Hand gesture recognition using Kinect. In Proceedings of the 2012 IEEE 3rd International Conference on Software Engineering and Service Science (ICSESS). IEEE, 196--199.Google ScholarCross Ref
- Zhi Li and Ray Jarvis. 2009. Real time hand gesture recognition using a range camera. In Proceedings of the Australasian Conference on Robotics and Automation. 21--27.Google Scholar
- Li Liu and Ling Shao. 2013. Learning discriminative representations from RGB-D video data. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). Google ScholarDigital Library
- S. Malassiotis and M. G. Strintzis. 2008. Real-time hand posture recognition using range data. Image and Vision Computing 26, 7 (2008), 1027--1037. Google ScholarDigital Library
- Vinod Nair and Geoffrey Hinton. 2009. 3-d object recognition with deep belief nets. Advances in Neural Information Processing Systems 22 (2009), 1339--1347.Google Scholar
- C. Nebauer. 1998. Evaluation of convolutional neural networks for visual recognition. IEEE Transactions on Neural Networks 9, 4 (1998), 685--696. Google ScholarDigital Library
- V. Radha and M. Krishnaveni. 2009. Threshold based segmentation using median filter for sign language recognition system. In Proceedings of the World Congress on Nature & Biologically Inspired Computing, 2009 (NaBIC’’09). IEEE, 1394--1399.Google Scholar
- Zhou Ren, Junsong Yuan, and Zhengyou Zhang. 2011. Robust hand gesture recognition based on finger-earth mover’s distance with a commodity depth camera. In Proceedings of the 19th ACM International Conference on Multimedia. ACM, 1093--1096. Google ScholarDigital Library
- Ruslan Salakhutdinov and Geoffrey E. Hinton. 2008. Using deep belief nets to learn covariance kernels for Gaussian processes. Advances in Neural Information Processing Systems (2008), 1249--1256.Google Scholar
- Frank Seide, Gang Li, and Dong Yu. 2011. Conversational speech transcription using context-dependent deep neural networks. In Proceedings of Interspeech. 437--440.Google Scholar
- Poonam Suryanarayan, Anbumani Subramanian, and Dinesh Mandalapu. 2010. Dynamic hand pose recognition using depth data. In Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR’10). IEEE, 3105--3108. Google ScholarDigital Library
- Satoshi Suzuki. 1985. Topological structural analysis of digitized binary images by border following. Computer Vision, Graphics, and Image Processing 30, 1 (1985), 32--46.Google ScholarCross Ref
- Balazs Tusor and A. R. Varkonyi-Koczy. 2010. Circular fuzzy neural network based hand gesture and posture modeling. In Proceedings of the 2010 IEEE Instrumentation and Measurement Technology Conference (I2MTC’10). IEEE, 815--820.Google Scholar
- Michael Van den Bergh and Luc Van Gool. 2011. Combining RGB and ToF cameras for real-time 3D hand gesture interaction. In Proceedings of the 2011 IEEE Workshop on Applications of Computer Vision (WACV’11). IEEE, 66--72. Google ScholarDigital Library
- Jiang Wang, Zicheng Liu, Jan Chorowski, Zhuoyuan Chen, and Ying Wu. 2012. Robust 3D action recognition with random occupancy patterns. In Proceedings of the European Conference on Computer Vision (ECCV’12). Springer, 872--885. Google ScholarDigital Library
- Yue Gao, Meng Wang, Dacheng Tao, Rongrong Ji, and Qh Dai. 2012. 3D object retrieval and recognition with hypergraph analysis. IEEE Transactions on Image Processing 21, 9 (2012), 4290--4303. Google ScholarDigital Library
- Meng Wang, Xian-Sheng Hua, Tao Mei, Richang Hong, Guojun Qi, Yan Song, and Li-Rong Dai. 2009. Semi-supervised kernel density estimation for video annotation. Computer Vision and Image Understanding 113, 3 (2009), 384--396. Google ScholarDigital Library
- Lu Xia, Chia-Chih Chen, and J. K. Aggarwal. 2012. View invariant human action recognition using histograms of 3D joints. In Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’12). IEEE, 20--27.Google Scholar
- X. Zabulis, H. Baltzakis, and A. Argyros. 2009. Vision-based hand gesture recognition for human-computer interaction. The Universal Access Handbook. LEA (2009).Google Scholar
Index Terms
- A Real-Time Hand Posture Recognition System Using Deep Neural Networks
Recommendations
A real time vision-based hand gestures recognition system
ISICA'10: Proceedings of the 5th international conference on Advances in computation and intelligenceHand gesture recognition is an important aspect in Human-Computer interaction, and can be used in various applications, such as virtual reality and computer games. In this paper, we propose a real time hand gesture recognition system. It includes three ...
Real-time Hand Tracking Using Kinect
ICDSP '18: Proceedings of the 2nd International Conference on Digital Signal ProcessingReal-time hand tracking is fundamental to human gesture recognition. However, due to the huge computation, previous studies are either off-line or limited to given poses. In order to satisfy the requirement of real-time hand tracking, in this paper we ...
Robust hand posture recognition integrating multi-cue hand tracking
Edutainment'10: Proceedings of the Entertainment for education, and 5th international conference on E-learning and gamesThis paper proposes a robust real-time method for hand tracking and hand posture recognition. Dealing with complex background, scale-invariance and rotation-invariance are the difficulties for hand posture recognition. To solve these difficulties, we ...
Comments