Skip to main content
Erschienen in: Neural Computing and Applications 10/2021

15.09.2020 | Original Article

Convolutional neural network with spatial pyramid pooling for hand gesture recognition

verfasst von: Yong Soon Tan, Kian Ming Lim, Connie Tee, Chin Poo Lee, Cheng Yaw Low

Erschienen in: Neural Computing and Applications | Ausgabe 10/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Hand gesture provides a means for human to interact through a series of gestures. While hand gesture plays a significant role in human–computer interaction, it also breaks down the communication barrier and simplifies communication process between the general public and the hearing-impaired community. This paper outlines a convolutional neural network (CNN) integrated with spatial pyramid pooling (SPP), dubbed CNN–SPP, for vision-based hand gesture recognition. SPP is discerned mitigating the problem found in conventional pooling by having multi-level pooling stacked together to extend the features being fed into a fully connected layer. Provided with inputs of varying sizes, SPP also yields a fixed-length feature representation. Extensive experiments have been conducted to scrutinize the CNN–SPP performance on two well-known American sign language (ASL) datasets and one NUS hand gesture dataset. Our empirical results disclose that CNN–SPP prevails over other deep learning-driven instances.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Lim KM, Tan AW, Tan SC (2016) Block-based histogram of optical flow for isolated sign language recognition. J Vis Commun Image Represent 40:538–545CrossRef Lim KM, Tan AW, Tan SC (2016) Block-based histogram of optical flow for isolated sign language recognition. J Vis Commun Image Represent 40:538–545CrossRef
2.
Zurück zum Zitat Lim KM, Tan AW, Tan SC (2016) A feature covariance matrix with serial particle filter for isolated sign language recognition. Expert Syst Appl 54:208–218CrossRef Lim KM, Tan AW, Tan SC (2016) A feature covariance matrix with serial particle filter for isolated sign language recognition. Expert Syst Appl 54:208–218CrossRef
3.
Zurück zum Zitat Lim KM, Tan AW, Tan SC (2017) A four dukkha state-space model for hand tracking. Neurocomputing 267:311–319CrossRef Lim KM, Tan AW, Tan SC (2017) A four dukkha state-space model for hand tracking. Neurocomputing 267:311–319CrossRef
4.
Zurück zum Zitat Kour KP, Mathew L (2017) Sign language recognition using image processing. Int J Adv Res Comput Sci Softw Eng 7(8):10CrossRef Kour KP, Mathew L (2017) Sign language recognition using image processing. Int J Adv Res Comput Sci Softw Eng 7(8):10CrossRef
5.
Zurück zum Zitat Kumar BP, Manjunatha M (2017) A hybrid gesture recognition method for American sign language. Indian J Sci Technol 10(1):1–12CrossRef Kumar BP, Manjunatha M (2017) A hybrid gesture recognition method for American sign language. Indian J Sci Technol 10(1):1–12CrossRef
6.
Zurück zum Zitat He Y, Li G, Liao Y, Sun Y, Kong J, Jiang G, Jiang D, Tao B, Xu S, Liu H (2017) Gesture recognition based on an improved local sparse representation classification algorithm. Cluster Comput 22:10935–10946CrossRef He Y, Li G, Liao Y, Sun Y, Kong J, Jiang G, Jiang D, Tao B, Xu S, Liu H (2017) Gesture recognition based on an improved local sparse representation classification algorithm. Cluster Comput 22:10935–10946CrossRef
7.
Zurück zum Zitat Muthukumar K, Poorani S, Gobhinath S (2017) Vision based hand gesture recognition for Indian sign languages using local binary patterns with support vector machine classifier. Adv Natl Appl Sci 11(6):314–322 Muthukumar K, Poorani S, Gobhinath S (2017) Vision based hand gesture recognition for Indian sign languages using local binary patterns with support vector machine classifier. Adv Natl Appl Sci 11(6):314–322
8.
Zurück zum Zitat Hu Y (2018) Finger spelling recognition using depth information and support vector machine. Multimedia Tools Appl 77(21):29043–29057CrossRef Hu Y (2018) Finger spelling recognition using depth information and support vector machine. Multimedia Tools Appl 77(21):29043–29057CrossRef
9.
Zurück zum Zitat Pariwat T, Seresangtakul, P (2017) Thai finger-spelling sign language recognition using global and local features with SVM. In: 2017 9th international conference on knowledge and smart technology (KST), pp 116–120. IEEE Pariwat T, Seresangtakul, P (2017) Thai finger-spelling sign language recognition using global and local features with SVM. In: 2017 9th international conference on knowledge and smart technology (KST), pp 116–120. IEEE
10.
Zurück zum Zitat Silanon K (2017) Thai finger-spelling recognition using a cascaded classifier based on histogram of orientation gradient features. Computational intelligence and neuroscience 2017 Silanon K (2017) Thai finger-spelling recognition using a cascaded classifier based on histogram of orientation gradient features. Computational intelligence and neuroscience 2017
11.
Zurück zum Zitat Jadooki S, Mohamad D, Saba T, Almazyad AS, Rehman A (2017) Fused features mining for depth-based hand gesture recognition to classify blind human communication. Neural Comput Appl 28(11):3285–3294CrossRef Jadooki S, Mohamad D, Saba T, Almazyad AS, Rehman A (2017) Fused features mining for depth-based hand gesture recognition to classify blind human communication. Neural Comput Appl 28(11):3285–3294CrossRef
12.
Zurück zum Zitat Zare AA, Zahiri SH (2018) Recognition of a real-time signer-independent static Farsi sign language based on fourier coefficients amplitude. Int J Mach Learn Cybernet 9(5):727–741CrossRef Zare AA, Zahiri SH (2018) Recognition of a real-time signer-independent static Farsi sign language based on fourier coefficients amplitude. Int J Mach Learn Cybernet 9(5):727–741CrossRef
13.
Zurück zum Zitat Nai W, Liu Y, Rempel D, Wang Y (2017) Fast hand posture classification using depth features extracted from random line segments. Pattern Recogn 65:1–10CrossRef Nai W, Liu Y, Rempel D, Wang Y (2017) Fast hand posture classification using depth features extracted from random line segments. Pattern Recogn 65:1–10CrossRef
14.
Zurück zum Zitat Hu Y, Zhao HF, Wang ZG (2018) Sign language fingerspelling recognition using depth information and deep belief networks. Int J Pattern Recognit Artif Intell 32(06):1850018MathSciNetCrossRef Hu Y, Zhao HF, Wang ZG (2018) Sign language fingerspelling recognition using depth information and deep belief networks. Int J Pattern Recognit Artif Intell 32(06):1850018MathSciNetCrossRef
15.
Zurück zum Zitat Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105 Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
16.
Zurück zum Zitat Lim KM, Tan AWC, Lee CP, Tan SC (2019) Isolated sign language recognition using convolutional neural network hand modelling and hand energy image. Multimedia Tools Appl 78:1–28CrossRef Lim KM, Tan AWC, Lee CP, Tan SC (2019) Isolated sign language recognition using convolutional neural network hand modelling and hand energy image. Multimedia Tools Appl 78:1–28CrossRef
17.
Zurück zum Zitat Nakjai P, Katanyukul T (2019) Hand sign recognition for Thai finger spelling: an application of convolution neural network. J Signal Process Syst 91(2):131–146CrossRef Nakjai P, Katanyukul T (2019) Hand sign recognition for Thai finger spelling: an application of convolution neural network. J Signal Process Syst 91(2):131–146CrossRef
18.
Zurück zum Zitat Li Y, Wang X, Liu W, Feng B (2018) Deep attention network for joint hand gesture localization and recognition using static rgb-d images. Inf Sci 441:66–78MathSciNetCrossRef Li Y, Wang X, Liu W, Feng B (2018) Deep attention network for joint hand gesture localization and recognition using static rgb-d images. Inf Sci 441:66–78MathSciNetCrossRef
19.
Zurück zum Zitat Kwolek B, Sako S (2017) Learning siamese features for finger spelling recognition. In: International conference on advanced concepts for intelligent vision systems, Springer, pp 225–236 Kwolek B, Sako S (2017) Learning siamese features for finger spelling recognition. In: International conference on advanced concepts for intelligent vision systems, Springer, pp 225–236
20.
Zurück zum Zitat Hosoe H, Sako S, Kwolek B (2017) Recognition of jsl finger spelling using convolutional neural networks. In: 2017 Fifteenth IAPR international conference on machine vision applications (MVA), IEEE, pp 85–88 Hosoe H, Sako S, Kwolek B (2017) Recognition of jsl finger spelling using convolutional neural networks. In: 2017 Fifteenth IAPR international conference on machine vision applications (MVA), IEEE, pp 85–88
21.
Zurück zum Zitat Gao Q, Liu J, Ju Z, Li Y, Zhang T, Zhang L (2017) Static hand gesture recognition with parallel cnns for space human-robot interaction. In: International conference on intelligent robotics and applications, Springer, pp 462–473 Gao Q, Liu J, Ju Z, Li Y, Zhang T, Zhang L (2017) Static hand gesture recognition with parallel cnns for space human-robot interaction. In: International conference on intelligent robotics and applications, Springer, pp 462–473
22.
Zurück zum Zitat Kania, K, Markowska-Kaczmar U (2018) American sign language fingerspelling recognition using wide residual networks. In: International conference on artificial intelligence and soft computing, Springer, pp 97–107 Kania, K, Markowska-Kaczmar U (2018) American sign language fingerspelling recognition using wide residual networks. In: International conference on artificial intelligence and soft computing, Springer, pp 97–107
23.
Zurück zum Zitat Oliveira M, Chatbri H, Little S, Ferstl Y, O’Connor NE, Sutherland A (2017) Irish sign language recognition using principal component analysis and convolutional neural networks. In: 2017 international conference on digital image computing: techniques and applications (DICTA), IEEE, pp 1–8 Oliveira M, Chatbri H, Little S, Ferstl Y, O’Connor NE, Sutherland A (2017) Irish sign language recognition using principal component analysis and convolutional neural networks. In: 2017 international conference on digital image computing: techniques and applications (DICTA), IEEE, pp 1–8
24.
Zurück zum Zitat Flores CJL, Cutipa AG, Enciso RL (2017) Application of convolutional neural networks for static hand gestures recognition under different invariant features. In: 2017 IEEE XXIV international conference on electronics, electrical engineering and computing (INTERCON), IEEE, pp 1–4 Flores CJL, Cutipa AG, Enciso RL (2017) Application of convolutional neural networks for static hand gestures recognition under different invariant features. In: 2017 IEEE XXIV international conference on electronics, electrical engineering and computing (INTERCON), IEEE, pp 1–4
25.
Zurück zum Zitat Alani AA, Cosma G, Taherkhani A, McGinnity T (2018) Hand gesture recognition using an adapted convolutional neural network with data augmentation. In: 2018 4th international conference on information management (ICIM), IEEE, pp 5–12 Alani AA, Cosma G, Taherkhani A, McGinnity T (2018) Hand gesture recognition using an adapted convolutional neural network with data augmentation. In: 2018 4th international conference on information management (ICIM), IEEE, pp 5–12
26.
Zurück zum Zitat Arenas JOP, Moreno RJ, Beleño RDH (2018) Convolutional neural network with a dag architecture for control of a robotic arm by means of hand gestures. Contemp Eng Sci 11(12):547–557CrossRef Arenas JOP, Moreno RJ, Beleño RDH (2018) Convolutional neural network with a dag architecture for control of a robotic arm by means of hand gestures. Contemp Eng Sci 11(12):547–557CrossRef
27.
Zurück zum Zitat Tazhigaliyeva N, Kalidolda N, Imashev A, Islam S, Aitpayev K, Parisi GI, Sandygulova A (2017) Cyrillic manual alphabet recognition in rgb and rgb-d data for sign language interpreting robotic system (slirs). In: 2017 IEEE international conference on robotics and automation (ICRA), IEEE, pp 4531–4536 Tazhigaliyeva N, Kalidolda N, Imashev A, Islam S, Aitpayev K, Parisi GI, Sandygulova A (2017) Cyrillic manual alphabet recognition in rgb and rgb-d data for sign language interpreting robotic system (slirs). In: 2017 IEEE international conference on robotics and automation (ICRA), IEEE, pp 4531–4536
28.
Zurück zum Zitat Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach F, Blei D (eds) Proceedings of the 32nd international conference on machine learning, Proceedings of machine learning research, vol 37, pp 448–456. PMLR, Lille, France Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach F, Blei D (eds) Proceedings of the 32nd international conference on machine learning, Proceedings of machine learning research, vol 37, pp 448–456. PMLR, Lille, France
29.
Zurück zum Zitat He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916CrossRef He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916CrossRef
30.
Zurück zum Zitat Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958MathSciNetMATH Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958MathSciNetMATH
31.
Zurück zum Zitat Pugeault N, Bowden R (2011) Spelling it out: real-time asl fingerspelling recognition. In: 2011 IEEE international conference on computer vision workshops (ICCV workshops), IEEE, pp 1114–1119 Pugeault N, Bowden R (2011) Spelling it out: real-time asl fingerspelling recognition. In: 2011 IEEE international conference on computer vision workshops (ICCV workshops), IEEE, pp 1114–1119
32.
Zurück zum Zitat Barczak ALC, Reyes N.H, Abastillas M, Piccio A, Susnjak T (2011) A new 2d static hand gesture colour image dataset for asl gestures Barczak ALC, Reyes N.H, Abastillas M, Piccio A, Susnjak T (2011) A new 2d static hand gesture colour image dataset for asl gestures
33.
Zurück zum Zitat Kumar PP, Vadakkepat P, Loh AP (2010) Hand posture and face recognition using a fuzzy-rough approach. Int J Humanoid Rob 7(03):331–356CrossRef Kumar PP, Vadakkepat P, Loh AP (2010) Hand posture and face recognition using a fuzzy-rough approach. Int J Humanoid Rob 7(03):331–356CrossRef
34.
Zurück zum Zitat Shin HC, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Yao J, Mollura D, Summers RM (2016) Deep convolutional neural networks for computer-aided detection: Cnn architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging 35(5):1285–1298CrossRef Shin HC, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Yao J, Mollura D, Summers RM (2016) Deep convolutional neural networks for computer-aided detection: Cnn architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging 35(5):1285–1298CrossRef
35.
Zurück zum Zitat Zhang L, Yang F, Zhang YD, Zhu YJ (2016) Road crack detection using deep convolutional neural network. In: 2016 IEEE international conference on image processing (ICIP), IEEE, pp 3708–3712 Zhang L, Yang F, Zhang YD, Zhu YJ (2016) Road crack detection using deep convolutional neural network. In: 2016 IEEE international conference on image processing (ICIP), IEEE, pp 3708–3712
36.
Zurück zum Zitat Kagaya H, Aizawa K, Ogawa, M (2014) Food detection and recognition using convolutional neural network. In: Proceedings of the 22nd ACM international conference on Multimedia, pp. 1085–1088 Kagaya H, Aizawa K, Ogawa, M (2014) Food detection and recognition using convolutional neural network. In: Proceedings of the 22nd ACM international conference on Multimedia, pp. 1085–1088
37.
Zurück zum Zitat Pigou L, Dieleman S, Kindermans PJ, Schrauwen B (2014) Sign language recognition using convolutional neural networks. In: European conference on computer vision, Springer, pp 572–578 Pigou L, Dieleman S, Kindermans PJ, Schrauwen B (2014) Sign language recognition using convolutional neural networks. In: European conference on computer vision, Springer, pp 572–578
38.
Zurück zum Zitat Ma Y, Zhou G, Wang S, Zhao H, Jung W (2018) SignFi: sign language recognition using WiFi. Proc ACM Interact Mobile Wearable Ubiquit Technol 2(1):1–21CrossRef Ma Y, Zhou G, Wang S, Zhao H, Jung W (2018) SignFi: sign language recognition using WiFi. Proc ACM Interact Mobile Wearable Ubiquit Technol 2(1):1–21CrossRef
Metadaten
Titel
Convolutional neural network with spatial pyramid pooling for hand gesture recognition
verfasst von
Yong Soon Tan
Kian Ming Lim
Connie Tee
Chin Poo Lee
Cheng Yaw Low
Publikationsdatum
15.09.2020
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 10/2021
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-020-05337-0

Weitere Artikel der Ausgabe 10/2021

Neural Computing and Applications 10/2021 Zur Ausgabe

S.I. : Higher Level Artificial Neural Network Based Intelligent Systems

Copy-move forgery detection technique based on discrete cosine transform blocks features

S.I.: Higher Level Artificial Neural Network Based Intelligent Systems

Multi-source data fusion for economic data analysis

S.I. : Higher Level Artificial Neural Network Based Intelligent Systems

Large-area damage image restoration algorithm based on generative adversarial network

S.I. : Higher Level Artificial Neural Network Based Intelligent Systems

ATP-DenseNet: a hybrid deep learning-based gender identification of handwriting