skip to main content
research-article

A Real-Time Hand Posture Recognition System Using Deep Neural Networks

Authors Info & Claims
Published:31 March 2015Publication History
Skip Abstract Section

Abstract

Hand posture recognition (HPR) is quite a challenging task, due to both the difficulty in detecting and tracking hands with normal cameras and the limitations of traditional manually selected features. In this article, we propose a two-stage HPR system for Sign Language Recognition using a Kinect sensor. In the first stage, we propose an effective algorithm to implement hand detection and tracking. The algorithm incorporates both color and depth information, without specific requirements on uniform-colored or stable background. It can handle the situations in which hands are very close to other parts of the body or hands are not the nearest objects to the camera and allows for occlusion of hands caused by faces or other hands. In the second stage, we apply deep neural networks (DNNs) to automatically learn features from hand posture images that are insensitive to movement, scaling, and rotation. Experiments verify that the proposed system works quickly and accurately and achieves a recognition accuracy as high as 98.12%.

References

  1. Alper Aksaç, Orkun Öztürk, and Tansel Özyer. 2011. Real-time multi-objective hand posture/gesture recognition by using distance classifiers and finite state machine for virtual mouse operations. In Proceedings of the 2011 7th International Conference on Electrical and Electronics Engineering (ELECO’11). IEEE, II--457.Google ScholarGoogle Scholar
  2. Antonis A. Argyros and Manolis I. A. Lourakis. 2004. Real-time tracking of multiple skin-colored objects with a possibly moving camera. In Proceedings of the European Conference on Computer Vision (ECCV’04). Springer, 368--379.Google ScholarGoogle Scholar
  3. Chuqing Cao and Ruifeng Li. 2010. Real-time hand posture recognition using Haar-like and topological feature. In Proceedings of the 2010 International Conference on Machine Vision and Human-Machine Interface (MVHI’10). IEEE, 683--687. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Manuel Caputo, Klaus Denker, Benjamin Dums, and Georg Umlauf. 2012. 3D hand gesture recognition based on sensor fusion of commodity hardware. In Mensch & Computer 2012: interaktiv informiert--allgegenwäärtig und allumfassend!?Google ScholarGoogle Scholar
  5. Douglas Chai and King N. Ngan. 1999. Face segmentation using skin-color map in videophone applications. IEEE Transactions on Circuits and Systems for Video Technology 9, 4 (1999), 551--564. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Feng-Sheng Chen, Chih-Ming Fu, and Chung-Lin Huang. 2003. Hand gesture recognition using a real-time tracking method and hidden Markov models. Image and Vision Computing 21, 8 (2003), 745--758.Google ScholarGoogle ScholarCross RefCross Ref
  7. George E. Dahl, Dong Yu, Li Deng, and Alex Acero. 2012. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 20, 1 (2012), 30--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Marco Fagiani, Emanuele Principi, Stefano Squartini, and Francesco Piazza. 2013. A new system for automatic recognition of italian sign language. In Neural Nets and Surroundings. Springer, 69--79.Google ScholarGoogle Scholar
  9. Gian Luca Foresti. 1999. Object recognition and tracking for remote video surveillance. IEEE Transactions on Circuits and Systems for Video Technology 9, 7 (1999), 1045--1062. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Wen Gao, Gaolin Fang, Debin Zhao, and Yiqiang Chen. 2004. A Chinese sign language recognition system based on SOFM/SRN/HMM. Pattern Recognition 37, 12 (2004), 2389--2402. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Geoffrey E. Hinton. 2002. Training products of experts by minimizing contrastive divergence. Neural Computation 14, 8 (2002), 1771--1800. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A fast learning algorithm for deep belief nets. Neural Computation 18, 7 (2006), 1527--1554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Geoffrey E. Hinton and Ruslan R. Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504--507.Google ScholarGoogle Scholar
  14. Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin. 2003. A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University. July, 2003.Google ScholarGoogle Scholar
  15. Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 2013. 3D convolutional neural networks for human action recognition. (2013).Google ScholarGoogle Scholar
  16. Alex Krizhevsky and Geoffrey E. Hinton. 2011. Using very deep autoencoders for content-based image retrieval. In Proceeding of the European Symposium on Artificial Neural Networks (ESANN’11).Google ScholarGoogle Scholar
  17. A. Kurakin, Z. Zhang, and Z. Liu. 2012. A real time system for dynamic hand gesture recognition with a depth sensor. In Proceedings of the 20th European Signal Processing Conference (EUSIPCO’12). IEEE, 1975--1979.Google ScholarGoogle Scholar
  18. Yann LeCun. 1989. Generalization and network design strategies. Connectionism in Perspective (1989), 143--155.Google ScholarGoogle Scholar
  19. Yann LeCun and Yoshua Bengio. 1995. Convolutional networks for images, speech, and time series. The Handbook of Brain Theory and Neural Networks 3361, 310 (1995). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278--2324.Google ScholarGoogle ScholarCross RefCross Ref
  21. Billy Y. L. Li, Ajmal S. Mian, Wanquan Liu, and Aneesh Krishna. 2013. Using Kinect for face recognition under varying poses, expressions, illumination and disguise. In Proceeding of the 2013 IEEE Workshop on Applications of Computer Vision (WACV). IEEE, 186--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Yi Li. 2012. Hand gesture recognition using Kinect. In Proceedings of the 2012 IEEE 3rd International Conference on Software Engineering and Service Science (ICSESS). IEEE, 196--199.Google ScholarGoogle ScholarCross RefCross Ref
  23. Zhi Li and Ray Jarvis. 2009. Real time hand gesture recognition using a range camera. In Proceedings of the Australasian Conference on Robotics and Automation. 21--27.Google ScholarGoogle Scholar
  24. Li Liu and Ling Shao. 2013. Learning discriminative representations from RGB-D video data. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Malassiotis and M. G. Strintzis. 2008. Real-time hand posture recognition using range data. Image and Vision Computing 26, 7 (2008), 1027--1037. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Vinod Nair and Geoffrey Hinton. 2009. 3-d object recognition with deep belief nets. Advances in Neural Information Processing Systems 22 (2009), 1339--1347.Google ScholarGoogle Scholar
  27. C. Nebauer. 1998. Evaluation of convolutional neural networks for visual recognition. IEEE Transactions on Neural Networks 9, 4 (1998), 685--696. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. V. Radha and M. Krishnaveni. 2009. Threshold based segmentation using median filter for sign language recognition system. In Proceedings of the World Congress on Nature & Biologically Inspired Computing, 2009 (NaBIC’’09). IEEE, 1394--1399.Google ScholarGoogle Scholar
  29. Zhou Ren, Junsong Yuan, and Zhengyou Zhang. 2011. Robust hand gesture recognition based on finger-earth mover’s distance with a commodity depth camera. In Proceedings of the 19th ACM International Conference on Multimedia. ACM, 1093--1096. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ruslan Salakhutdinov and Geoffrey E. Hinton. 2008. Using deep belief nets to learn covariance kernels for Gaussian processes. Advances in Neural Information Processing Systems (2008), 1249--1256.Google ScholarGoogle Scholar
  31. Frank Seide, Gang Li, and Dong Yu. 2011. Conversational speech transcription using context-dependent deep neural networks. In Proceedings of Interspeech. 437--440.Google ScholarGoogle Scholar
  32. Poonam Suryanarayan, Anbumani Subramanian, and Dinesh Mandalapu. 2010. Dynamic hand pose recognition using depth data. In Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR’10). IEEE, 3105--3108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Satoshi Suzuki. 1985. Topological structural analysis of digitized binary images by border following. Computer Vision, Graphics, and Image Processing 30, 1 (1985), 32--46.Google ScholarGoogle ScholarCross RefCross Ref
  34. Balazs Tusor and A. R. Varkonyi-Koczy. 2010. Circular fuzzy neural network based hand gesture and posture modeling. In Proceedings of the 2010 IEEE Instrumentation and Measurement Technology Conference (I2MTC’10). IEEE, 815--820.Google ScholarGoogle Scholar
  35. Michael Van den Bergh and Luc Van Gool. 2011. Combining RGB and ToF cameras for real-time 3D hand gesture interaction. In Proceedings of the 2011 IEEE Workshop on Applications of Computer Vision (WACV’11). IEEE, 66--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Jiang Wang, Zicheng Liu, Jan Chorowski, Zhuoyuan Chen, and Ying Wu. 2012. Robust 3D action recognition with random occupancy patterns. In Proceedings of the European Conference on Computer Vision (ECCV’12). Springer, 872--885. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Yue Gao, Meng Wang, Dacheng Tao, Rongrong Ji, and Qh Dai. 2012. 3D object retrieval and recognition with hypergraph analysis. IEEE Transactions on Image Processing 21, 9 (2012), 4290--4303. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Meng Wang, Xian-Sheng Hua, Tao Mei, Richang Hong, Guojun Qi, Yan Song, and Li-Rong Dai. 2009. Semi-supervised kernel density estimation for video annotation. Computer Vision and Image Understanding 113, 3 (2009), 384--396. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Lu Xia, Chia-Chih Chen, and J. K. Aggarwal. 2012. View invariant human action recognition using histograms of 3D joints. In Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’12). IEEE, 20--27.Google ScholarGoogle Scholar
  40. X. Zabulis, H. Baltzakis, and A. Argyros. 2009. Vision-based hand gesture recognition for human-computer interaction. The Universal Access Handbook. LEA (2009).Google ScholarGoogle Scholar

Index Terms

  1. A Real-Time Hand Posture Recognition System Using Deep Neural Networks

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Intelligent Systems and Technology
        ACM Transactions on Intelligent Systems and Technology  Volume 6, Issue 2
        Special Section on Visual Understanding with RGB-D Sensors
        May 2015
        381 pages
        ISSN:2157-6904
        EISSN:2157-6912
        DOI:10.1145/2753829
        • Editor:
        • Huan Liu
        Issue’s Table of Contents

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 31 March 2015
        • Accepted: 1 January 2014
        • Revised: 1 November 2013
        • Received: 1 July 2013
        Published in tist Volume 6, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader