Skip to main content

2020 | OriginalPaper | Buchkapitel

Multi-view Robust Gesture Recognition for Assistive Interfaces

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we propose a gesture recognition approach using a multi-view setup for assistive device applications. As smart assistances become a reality, the need to interact with them in a natural fashion, as we do with other humans, becomes crucial. Gestures are a fundamental modality of human interaction, being natural and intuitive. We propose a gesture recognition approach relying on upper-body joints’ motions, so that individuals suffering from motor dysfunctions, that need to use wheelchairs or cannot stand, can as well interact with their smart assistive devices. To achieve this goal, we propose a robust multi-view skeleton fusion through a Kalman filtering technique, followed by an upper-body handcrafted feature extraction process. Gestures are classified using a support vector machine (SVM) classifier. Experiments with our captured dataset revealed a strong generalization from our method, and an increased performance of our multi-view fusion over the individual views.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. (CSUR) 43(3) (2011). Article no. 16CrossRef Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. (CSUR) 43(3) (2011). Article no. 16CrossRef
2.
Zurück zum Zitat Aggarwal, J.K., Xia, L.: Human activity recognition from 3d data: a review. Pattern Recogn. Lett. 48, 70–80 (2014)CrossRef Aggarwal, J.K., Xia, L.: Human activity recognition from 3d data: a review. Pattern Recogn. Lett. 48, 70–80 (2014)CrossRef
3.
Zurück zum Zitat Amor, B.B., Su, J., Srivastava, A.: Action recognition using rate-invariant analysis of skeletal shape trajectories. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 1–13 (2016)CrossRef Amor, B.B., Su, J., Srivastava, A.: Action recognition using rate-invariant analysis of skeletal shape trajectories. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 1–13 (2016)CrossRef
4.
Zurück zum Zitat Baak, A., Müller, M., Bharaj, G., Seidel, H.P., Theobalt, C.: A data-driven approach for real-time full body pose reconstruction from a depth camera. In: Consumer Depth Cameras for Computer Vision, pp. 71–98. Springer (2013) Baak, A., Müller, M., Bharaj, G., Seidel, H.P., Theobalt, C.: A data-driven approach for real-time full body pose reconstruction from a depth camera. In: Consumer Depth Cameras for Computer Vision, pp. 71–98. Springer (2013)
5.
Zurück zum Zitat Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152. ACM (1992) Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152. ACM (1992)
6.
Zurück zum Zitat Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: CVPR (2017) Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: CVPR (2017)
7.
Zurück zum Zitat Cippitelli, E., Gasparrini, S., Gambi, E., Spinsante, S.: A human activity recognition system using skeleton data from RGBD sensors. Comput. Intell. Neurosci. 2016, 21 (2016)CrossRef Cippitelli, E., Gasparrini, S., Gambi, E., Spinsante, S.: A human activity recognition system using skeleton data from RGBD sensors. Comput. Intell. Neurosci. 2016, 21 (2016)CrossRef
8.
Zurück zum Zitat Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2625–2634 (2015) Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2625–2634 (2015)
9.
Zurück zum Zitat Du, H., Zhao, Y., Han, J., Wang, Z., Song, G.: Data fusion of multiple kinect sensors for a rehabilitation system. In: 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 4869–4872. IEEE (2016) Du, H., Zhao, Y., Han, J., Wang, Z., Song, G.: Data fusion of multiple kinect sensors for a rehabilitation system. In: 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 4869–4872. IEEE (2016)
10.
Zurück zum Zitat Eweiwi, A., Cheema, M.S., Bauckhage, C., Gall, J.: Efficient pose-based action recognition. In: Asian Conference on Computer Vision, pp. 428–443. Springer (2014) Eweiwi, A., Cheema, M.S., Bauckhage, C., Gall, J.: Efficient pose-based action recognition. In: Asian Conference on Computer Vision, pp. 428–443. Springer (2014)
11.
Zurück zum Zitat Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1933–1941 (2016) Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1933–1941 (2016)
12.
Zurück zum Zitat Gan, Q., Harris, C.J.: Comparison of two measurement fusion methods for Kalman-filter-based multisensor data fusion. IEEE Trans. Aerosp. Electron. Syst. 37(1), 273–279 (2001)CrossRef Gan, Q., Harris, C.J.: Comparison of two measurement fusion methods for Kalman-filter-based multisensor data fusion. IEEE Trans. Aerosp. Electron. Syst. 37(1), 273–279 (2001)CrossRef
13.
Zurück zum Zitat Girão, P., Paulo, J., Garrote, L., Peixoto, P.: Real-time multi-view grid map-based spatial representation for mixed reality applications. In: De Paolis, L.T., Bourdot, P. (eds.) Augmented Reality, Virtual Reality, and Computer Graphics, pp. 322–339. Springer International Publishing, Cham (2018)CrossRef Girão, P., Paulo, J., Garrote, L., Peixoto, P.: Real-time multi-view grid map-based spatial representation for mixed reality applications. In: De Paolis, L.T., Bourdot, P. (eds.) Augmented Reality, Virtual Reality, and Computer Graphics, pp. 322–339. Springer International Publishing, Cham (2018)CrossRef
14.
Zurück zum Zitat Herath, S., Harandi, M., Porikli, F.: Going deeper into action recognition: a survey. Image Vis. Comput. 60, 4–21 (2017)CrossRef Herath, S., Harandi, M., Porikli, F.: Going deeper into action recognition: a survey. Image Vis. Comput. 60, 4–21 (2017)CrossRef
15.
Zurück zum Zitat Hofmann, M., Gavrila, D.M.: Multi-view 3d human pose estimation in complex environment. Int. J. Comput. Vision 96(1), 103–124 (2012)MathSciNetCrossRef Hofmann, M., Gavrila, D.M.: Multi-view 3d human pose estimation in complex environment. Int. J. Comput. Vision 96(1), 103–124 (2012)MathSciNetCrossRef
16.
Zurück zum Zitat Ke, S.R., Thuc, H., Lee, Y.J., Hwang, J.N., Yoo, J.H., Choi, K.H.: A review on video-based human activity recognition. Computers 2(2), 88–131 (2013)CrossRef Ke, S.R., Thuc, H., Lee, Y.J., Hwang, J.N., Yoo, J.H., Choi, K.H.: A review on video-based human activity recognition. Computers 2(2), 88–131 (2013)CrossRef
17.
Zurück zum Zitat Kitsikidis, A., Dimitropoulos, K., Douka, S., Grammalidis, N.: Dance analysis using multiple kinect sensors. In: 2014 International Conference on Computer Vision Theory and Applications (VISAPP), vol. 2, pp. 789–795. IEEE (2014) Kitsikidis, A., Dimitropoulos, K., Douka, S., Grammalidis, N.: Dance analysis using multiple kinect sensors. In: 2014 International Conference on Computer Vision Theory and Applications (VISAPP), vol. 2, pp. 789–795. IEEE (2014)
18.
Zurück zum Zitat Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014) Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014)
19.
Zurück zum Zitat Liu, Y., Gall, J., Stoll, C., Dai, Q., Seidel, H.P., Theobalt, C.: Markerless motion capture of multiple characters using multiview image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2720–2735 (2013)CrossRef Liu, Y., Gall, J., Stoll, C., Dai, Q., Seidel, H.P., Theobalt, C.: Markerless motion capture of multiple characters using multiview image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2720–2735 (2013)CrossRef
20.
Zurück zum Zitat Masse, J.T., Lerasle, F., Devy, M., Monin, A., Lefebvre, O., Mas, S.: Human motion capture using data fusion of multiple skeleton data. In: International Conference on Advanced Concepts for Intelligent Vision Systems, pp. 126–137. Springer (2013) Masse, J.T., Lerasle, F., Devy, M., Monin, A., Lefebvre, O., Mas, S.: Human motion capture using data fusion of multiple skeleton data. In: International Conference on Advanced Concepts for Intelligent Vision Systems, pp. 126–137. Springer (2013)
21.
Zurück zum Zitat Park, S., Trivedi, M.M.: Understanding human interactions with track and body synergies (TBS) captured from multiple views. Comput. Vis. Image Underst. 111(1), 2–20 (2008)CrossRef Park, S., Trivedi, M.M.: Understanding human interactions with track and body synergies (TBS) captured from multiple views. Comput. Vis. Image Underst. 111(1), 2–20 (2008)CrossRef
22.
Zurück zum Zitat Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28(6), 976–990 (2010)CrossRef Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28(6), 976–990 (2010)CrossRef
23.
Zurück zum Zitat Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3d residual networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5534–5542. IEEE (2017) Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3d residual networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5534–5542. IEEE (2017)
24.
Zurück zum Zitat Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014) Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014)
25.
Zurück zum Zitat Tao, L., Vidal, R.: Moving poselets: a discriminative and interpretable skeletal motion representation for action recognition. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 61–69 (2015) Tao, L., Vidal, R.: Moving poselets: a discriminative and interpretable skeletal motion representation for action recognition. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 61–69 (2015)
26.
Zurück zum Zitat Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015) Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)
27.
Zurück zum Zitat Turaga, P., Chellappa, R., Subrahmanian, V.S., Udrea, O.: Machine recognition of human activities: a survey. IEEE Trans. Circuits Syst. Video Technol. 18(11), 1473 (2008)CrossRef Turaga, P., Chellappa, R., Subrahmanian, V.S., Udrea, O.: Machine recognition of human activities: a survey. IEEE Trans. Circuits Syst. Video Technol. 18(11), 1473 (2008)CrossRef
28.
Zurück zum Zitat Yeung, K.Y., Kwok, T.H., Wang, C.C.: Improved skeleton tracking by duplex kinects: a practical approach for real-time applications. J. Comput. Inf. Sci. Eng. 13(4), 041007 (2013)CrossRef Yeung, K.Y., Kwok, T.H., Wang, C.C.: Improved skeleton tracking by duplex kinects: a practical approach for real-time applications. J. Comput. Inf. Sci. Eng. 13(4), 041007 (2013)CrossRef
29.
Zurück zum Zitat Zhang, L., Sturm, J., Cremers, D., Lee, D.: Real-time human motion tracking using multiple depth cameras. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2389–2395. IEEE (2012) Zhang, L., Sturm, J., Cremers, D., Lee, D.: Real-time human motion tracking using multiple depth cameras. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2389–2395. IEEE (2012)
30.
Zurück zum Zitat Zhu, G., Zhang, L., Shen, P., Song, J.: Human action recognition using multi-layer codebooks of key poses and atomic motions. Sig. Process. Image Commun. 42, 19–30 (2016)CrossRef Zhu, G., Zhang, L., Shen, P., Song, J.: Human action recognition using multi-layer codebooks of key poses and atomic motions. Sig. Process. Image Commun. 42, 19–30 (2016)CrossRef
31.
Zurück zum Zitat Ziegler, J., Nickel, K., Stiefelhagen, R.: Tracking of the articulated upper body on multi-view stereo image sequences. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), vol. 1, pp. 774–781. IEEE (2006) Ziegler, J., Nickel, K., Stiefelhagen, R.: Tracking of the articulated upper body on multi-view stereo image sequences. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), vol. 1, pp. 774–781. IEEE (2006)
Metadaten
Titel
Multi-view Robust Gesture Recognition for Assistive Interfaces
verfasst von
João Paulo
Pedro Girão
Paulo Peixoto
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-31635-8_205

Neuer Inhalt