Skip to main content
Top

2020 | OriginalPaper | Chapter

Multi-view Robust Gesture Recognition for Assistive Interfaces

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, we propose a gesture recognition approach using a multi-view setup for assistive device applications. As smart assistances become a reality, the need to interact with them in a natural fashion, as we do with other humans, becomes crucial. Gestures are a fundamental modality of human interaction, being natural and intuitive. We propose a gesture recognition approach relying on upper-body joints’ motions, so that individuals suffering from motor dysfunctions, that need to use wheelchairs or cannot stand, can as well interact with their smart assistive devices. To achieve this goal, we propose a robust multi-view skeleton fusion through a Kalman filtering technique, followed by an upper-body handcrafted feature extraction process. Gestures are classified using a support vector machine (SVM) classifier. Experiments with our captured dataset revealed a strong generalization from our method, and an increased performance of our multi-view fusion over the individual views.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. (CSUR) 43(3) (2011). Article no. 16CrossRef Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. (CSUR) 43(3) (2011). Article no. 16CrossRef
2.
go back to reference Aggarwal, J.K., Xia, L.: Human activity recognition from 3d data: a review. Pattern Recogn. Lett. 48, 70–80 (2014)CrossRef Aggarwal, J.K., Xia, L.: Human activity recognition from 3d data: a review. Pattern Recogn. Lett. 48, 70–80 (2014)CrossRef
3.
go back to reference Amor, B.B., Su, J., Srivastava, A.: Action recognition using rate-invariant analysis of skeletal shape trajectories. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 1–13 (2016)CrossRef Amor, B.B., Su, J., Srivastava, A.: Action recognition using rate-invariant analysis of skeletal shape trajectories. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 1–13 (2016)CrossRef
4.
go back to reference Baak, A., Müller, M., Bharaj, G., Seidel, H.P., Theobalt, C.: A data-driven approach for real-time full body pose reconstruction from a depth camera. In: Consumer Depth Cameras for Computer Vision, pp. 71–98. Springer (2013) Baak, A., Müller, M., Bharaj, G., Seidel, H.P., Theobalt, C.: A data-driven approach for real-time full body pose reconstruction from a depth camera. In: Consumer Depth Cameras for Computer Vision, pp. 71–98. Springer (2013)
5.
go back to reference Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152. ACM (1992) Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152. ACM (1992)
6.
go back to reference Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: CVPR (2017) Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: CVPR (2017)
7.
go back to reference Cippitelli, E., Gasparrini, S., Gambi, E., Spinsante, S.: A human activity recognition system using skeleton data from RGBD sensors. Comput. Intell. Neurosci. 2016, 21 (2016)CrossRef Cippitelli, E., Gasparrini, S., Gambi, E., Spinsante, S.: A human activity recognition system using skeleton data from RGBD sensors. Comput. Intell. Neurosci. 2016, 21 (2016)CrossRef
8.
go back to reference Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2625–2634 (2015) Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2625–2634 (2015)
9.
go back to reference Du, H., Zhao, Y., Han, J., Wang, Z., Song, G.: Data fusion of multiple kinect sensors for a rehabilitation system. In: 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 4869–4872. IEEE (2016) Du, H., Zhao, Y., Han, J., Wang, Z., Song, G.: Data fusion of multiple kinect sensors for a rehabilitation system. In: 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 4869–4872. IEEE (2016)
10.
go back to reference Eweiwi, A., Cheema, M.S., Bauckhage, C., Gall, J.: Efficient pose-based action recognition. In: Asian Conference on Computer Vision, pp. 428–443. Springer (2014) Eweiwi, A., Cheema, M.S., Bauckhage, C., Gall, J.: Efficient pose-based action recognition. In: Asian Conference on Computer Vision, pp. 428–443. Springer (2014)
11.
go back to reference Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1933–1941 (2016) Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1933–1941 (2016)
12.
go back to reference Gan, Q., Harris, C.J.: Comparison of two measurement fusion methods for Kalman-filter-based multisensor data fusion. IEEE Trans. Aerosp. Electron. Syst. 37(1), 273–279 (2001)CrossRef Gan, Q., Harris, C.J.: Comparison of two measurement fusion methods for Kalman-filter-based multisensor data fusion. IEEE Trans. Aerosp. Electron. Syst. 37(1), 273–279 (2001)CrossRef
13.
go back to reference Girão, P., Paulo, J., Garrote, L., Peixoto, P.: Real-time multi-view grid map-based spatial representation for mixed reality applications. In: De Paolis, L.T., Bourdot, P. (eds.) Augmented Reality, Virtual Reality, and Computer Graphics, pp. 322–339. Springer International Publishing, Cham (2018)CrossRef Girão, P., Paulo, J., Garrote, L., Peixoto, P.: Real-time multi-view grid map-based spatial representation for mixed reality applications. In: De Paolis, L.T., Bourdot, P. (eds.) Augmented Reality, Virtual Reality, and Computer Graphics, pp. 322–339. Springer International Publishing, Cham (2018)CrossRef
14.
go back to reference Herath, S., Harandi, M., Porikli, F.: Going deeper into action recognition: a survey. Image Vis. Comput. 60, 4–21 (2017)CrossRef Herath, S., Harandi, M., Porikli, F.: Going deeper into action recognition: a survey. Image Vis. Comput. 60, 4–21 (2017)CrossRef
15.
go back to reference Hofmann, M., Gavrila, D.M.: Multi-view 3d human pose estimation in complex environment. Int. J. Comput. Vision 96(1), 103–124 (2012)MathSciNetCrossRef Hofmann, M., Gavrila, D.M.: Multi-view 3d human pose estimation in complex environment. Int. J. Comput. Vision 96(1), 103–124 (2012)MathSciNetCrossRef
16.
go back to reference Ke, S.R., Thuc, H., Lee, Y.J., Hwang, J.N., Yoo, J.H., Choi, K.H.: A review on video-based human activity recognition. Computers 2(2), 88–131 (2013)CrossRef Ke, S.R., Thuc, H., Lee, Y.J., Hwang, J.N., Yoo, J.H., Choi, K.H.: A review on video-based human activity recognition. Computers 2(2), 88–131 (2013)CrossRef
17.
go back to reference Kitsikidis, A., Dimitropoulos, K., Douka, S., Grammalidis, N.: Dance analysis using multiple kinect sensors. In: 2014 International Conference on Computer Vision Theory and Applications (VISAPP), vol. 2, pp. 789–795. IEEE (2014) Kitsikidis, A., Dimitropoulos, K., Douka, S., Grammalidis, N.: Dance analysis using multiple kinect sensors. In: 2014 International Conference on Computer Vision Theory and Applications (VISAPP), vol. 2, pp. 789–795. IEEE (2014)
18.
go back to reference Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014) Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014)
19.
go back to reference Liu, Y., Gall, J., Stoll, C., Dai, Q., Seidel, H.P., Theobalt, C.: Markerless motion capture of multiple characters using multiview image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2720–2735 (2013)CrossRef Liu, Y., Gall, J., Stoll, C., Dai, Q., Seidel, H.P., Theobalt, C.: Markerless motion capture of multiple characters using multiview image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2720–2735 (2013)CrossRef
20.
go back to reference Masse, J.T., Lerasle, F., Devy, M., Monin, A., Lefebvre, O., Mas, S.: Human motion capture using data fusion of multiple skeleton data. In: International Conference on Advanced Concepts for Intelligent Vision Systems, pp. 126–137. Springer (2013) Masse, J.T., Lerasle, F., Devy, M., Monin, A., Lefebvre, O., Mas, S.: Human motion capture using data fusion of multiple skeleton data. In: International Conference on Advanced Concepts for Intelligent Vision Systems, pp. 126–137. Springer (2013)
21.
go back to reference Park, S., Trivedi, M.M.: Understanding human interactions with track and body synergies (TBS) captured from multiple views. Comput. Vis. Image Underst. 111(1), 2–20 (2008)CrossRef Park, S., Trivedi, M.M.: Understanding human interactions with track and body synergies (TBS) captured from multiple views. Comput. Vis. Image Underst. 111(1), 2–20 (2008)CrossRef
22.
go back to reference Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28(6), 976–990 (2010)CrossRef Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28(6), 976–990 (2010)CrossRef
23.
go back to reference Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3d residual networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5534–5542. IEEE (2017) Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3d residual networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5534–5542. IEEE (2017)
24.
go back to reference Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014) Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014)
25.
go back to reference Tao, L., Vidal, R.: Moving poselets: a discriminative and interpretable skeletal motion representation for action recognition. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 61–69 (2015) Tao, L., Vidal, R.: Moving poselets: a discriminative and interpretable skeletal motion representation for action recognition. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 61–69 (2015)
26.
go back to reference Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015) Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)
27.
go back to reference Turaga, P., Chellappa, R., Subrahmanian, V.S., Udrea, O.: Machine recognition of human activities: a survey. IEEE Trans. Circuits Syst. Video Technol. 18(11), 1473 (2008)CrossRef Turaga, P., Chellappa, R., Subrahmanian, V.S., Udrea, O.: Machine recognition of human activities: a survey. IEEE Trans. Circuits Syst. Video Technol. 18(11), 1473 (2008)CrossRef
28.
go back to reference Yeung, K.Y., Kwok, T.H., Wang, C.C.: Improved skeleton tracking by duplex kinects: a practical approach for real-time applications. J. Comput. Inf. Sci. Eng. 13(4), 041007 (2013)CrossRef Yeung, K.Y., Kwok, T.H., Wang, C.C.: Improved skeleton tracking by duplex kinects: a practical approach for real-time applications. J. Comput. Inf. Sci. Eng. 13(4), 041007 (2013)CrossRef
29.
go back to reference Zhang, L., Sturm, J., Cremers, D., Lee, D.: Real-time human motion tracking using multiple depth cameras. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2389–2395. IEEE (2012) Zhang, L., Sturm, J., Cremers, D., Lee, D.: Real-time human motion tracking using multiple depth cameras. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2389–2395. IEEE (2012)
30.
go back to reference Zhu, G., Zhang, L., Shen, P., Song, J.: Human action recognition using multi-layer codebooks of key poses and atomic motions. Sig. Process. Image Commun. 42, 19–30 (2016)CrossRef Zhu, G., Zhang, L., Shen, P., Song, J.: Human action recognition using multi-layer codebooks of key poses and atomic motions. Sig. Process. Image Commun. 42, 19–30 (2016)CrossRef
31.
go back to reference Ziegler, J., Nickel, K., Stiefelhagen, R.: Tracking of the articulated upper body on multi-view stereo image sequences. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), vol. 1, pp. 774–781. IEEE (2006) Ziegler, J., Nickel, K., Stiefelhagen, R.: Tracking of the articulated upper body on multi-view stereo image sequences. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), vol. 1, pp. 774–781. IEEE (2006)
Metadata
Title
Multi-view Robust Gesture Recognition for Assistive Interfaces
Authors
João Paulo
Pedro Girão
Paulo Peixoto
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-31635-8_205