Skip to main content

2016 | OriginalPaper | Buchkapitel

Human Joint Angle Estimation and Gesture Recognition for Assistive Robotic Vision

verfasst von : Alp Guler, Nikolaos Kardaris, Siddhartha Chandra, Vassilis Pitsikalis, Christian Werner, Klaus Hauer, Costas Tzafestas, Petros Maragos, Iasonas Kokkinos

Erschienen in: Computer Vision – ECCV 2016 Workshops

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We explore new directions for automatic human gesture recognition and human joint angle estimation as applied for human-robot interaction in the context of an actual challenging task of assistive living for real-life elderly subjects. Our contributions include state-of-the-art approaches for both low- and mid-level vision, as well as for higher level action and gesture recognition. The first direction investigates a deep learning based framework for the challenging task of human joint angle estimation on noisy real world RGB-D images. The second direction includes the employment of dense trajectory features for online processing of videos for automatic gesture recognition with real-time performance. Our approaches are evaluated both qualitative and quantitatively on a newly acquired dataset that is constructed on a challenging real-life scenario on assistive living for elderly subjects.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The 8 selected gestures are: “Help”, “WantStandUp”, “PerformTask”, “WantSitDown”, “ComeCloser”, “ComeHere”, “LetsGo”, “Park”.
 
Literatur
1.
Zurück zum Zitat OECD: Elderly population (indicator) (2016) OECD: Elderly population (indicator) (2016)
2.
Zurück zum Zitat Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014) Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014)
3.
Zurück zum Zitat Niebles, J.C., Fei-Fei, L.: A hierarchical model of shape and appearance for human action classification. In: IEEE Conference on Computer Vision and Pattern Recognition, 2007, CVPR 2007, pp. 1–8. IEEE (2007) Niebles, J.C., Fei-Fei, L.: A hierarchical model of shape and appearance for human action classification. In: IEEE Conference on Computer Vision and Pattern Recognition, 2007, CVPR 2007, pp. 1–8. IEEE (2007)
4.
Zurück zum Zitat Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vis. 79(3), 299–318 (2008)CrossRef Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vis. 79(3), 299–318 (2008)CrossRef
5.
Zurück zum Zitat Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2009), pp. 2929–2936. IEEE (2009) Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2009), pp. 2929–2936. IEEE (2009)
6.
Zurück zum Zitat Wang, H., Klaser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3169–3176. IEEE (2011) Wang, H., Klaser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3169–3176. IEEE (2011)
7.
Zurück zum Zitat Wang, H., Schmid, C.: Action recognition with improved trajectories. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 3551–3558. IEEE (2013) Wang, H., Schmid, C.: Action recognition with improved trajectories. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 3551–3558. IEEE (2013)
8.
Zurück zum Zitat Fanello, S.R., Gori, I., Metta, G., Odone, F.: Keep it simple and sparse: Real-time action recognition. J. Mach. Learn. Res. 14, 2617–2640 (2013) Fanello, S.R., Gori, I., Metta, G., Odone, F.: Keep it simple and sparse: Real-time action recognition. J. Mach. Learn. Res. 14, 2617–2640 (2013)
9.
Zurück zum Zitat Chandra, S., Tsogkas, S., Kokkinos, I.: Accurate human-limb segmentation in rgb-d images for intelligent mobility assistance robots. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 44–50 (2015) Chandra, S., Tsogkas, S., Kokkinos, I.: Accurate human-limb segmentation in rgb-d images for intelligent mobility assistance robots. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 44–50 (2015)
10.
Zurück zum Zitat Fotinea, E.S., Efthimiou, E., Dimou, A.L., Goulas, T., Karioris, P., Peer, A., Maragos, P., Tzafestas, C., Kokkinos, I., Hauer, K., et al.: Data acquisition towards defining a multimodal interaction model for human-assistive robot communication. In: Stephanidis, C., Antona, M. (eds.) UAHCI/HCII 2014. LNCS, vol. 8515, pp. 613–624. Springer, Heidelberg (2014) Fotinea, E.S., Efthimiou, E., Dimou, A.L., Goulas, T., Karioris, P., Peer, A., Maragos, P., Tzafestas, C., Kokkinos, I., Hauer, K., et al.: Data acquisition towards defining a multimodal interaction model for human-assistive robot communication. In: Stephanidis, C., Antona, M. (eds.) UAHCI/HCII 2014. LNCS, vol. 8515, pp. 613–624. Springer, Heidelberg (2014)
11.
Zurück zum Zitat Escalera, S., Gonzàlez, J., Baró, X., Reyes, M., Guyon, I., Athitsos, V., Escalante, H., Sigal, L., Argyros, A., Sminchisescu, C., Bowden, R., Sclaroff, S.: Chalearn multi-modal gesture recognition 2013: grand challenge and workshop summary. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pp. 365–368. ACM (2013) Escalera, S., Gonzàlez, J., Baró, X., Reyes, M., Guyon, I., Athitsos, V., Escalante, H., Sigal, L., Argyros, A., Sminchisescu, C., Bowden, R., Sclaroff, S.: Chalearn multi-modal gesture recognition 2013: grand challenge and workshop summary. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pp. 365–368. ACM (2013)
12.
Zurück zum Zitat Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3476–3483 (2013) Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3476–3483 (2013)
13.
Zurück zum Zitat Toshev, A., Szegedy, C.: Deeppose: Human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014) Toshev, A., Szegedy, C.: Deeppose: Human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014)
14.
Zurück zum Zitat Tompson, J., Jain, A., LeCun, Y., Bregler, C.: Joint Training of a convolutional network and a graphical model for human pose estimation. In: NIPS (2014) Tompson, J., Jain, A., LeCun, Y., Bregler, C.: Joint Training of a convolutional network and a graphical model for human pose estimation. In: NIPS (2014)
15.
17.
18.
Zurück zum Zitat Haque, A., Peng, B., Luo, Z., Alahi, A., Yeung, S., Fei-Fei, L.: Viewpoint invariant 3d human pose estimation with recurrent error feedback. arXiv preprint arXiv:1603.07076 (2016) Haque, A., Peng, B., Luo, Z., Alahi, A., Yeung, S., Fei-Fei, L.: Viewpoint invariant 3d human pose estimation with recurrent error feedback. arXiv preprint arXiv:​1603.​07076 (2016)
19.
Zurück zum Zitat Ramakrishna, Varun, Munoz, Daniel, Hebert, Martial, Andrew Bagnell, James, Sheikh, Yaser: Pose machines: articulated pose estimation via inference machines. In: Fleet, David, Pajdla, Tomas, Schiele, Bernt, Tuytelaars, Tinne (eds.) ECCV 2014. LNCS, vol. 8690, pp. 33–47. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10605-2_3 Ramakrishna, Varun, Munoz, Daniel, Hebert, Martial, Andrew Bagnell, James, Sheikh, Yaser: Pose machines: articulated pose estimation via inference machines. In: Fleet, David, Pajdla, Tomas, Schiele, Bernt, Tuytelaars, Tinne (eds.) ECCV 2014. LNCS, vol. 8690, pp. 33–47. Springer, Heidelberg (2014). doi:10.​1007/​978-3-319-10605-2_​3
20.
Zurück zum Zitat Carreira, J., Agrawal, P., Fragkiadaki, K., Malik, J.: Human pose estimation with iterative error feedback. arXiv preprint arXiv:1507.06550 (2015) Carreira, J., Agrawal, P., Fragkiadaki, K., Malik, J.: Human pose estimation with iterative error feedback. arXiv preprint arXiv:​1507.​06550 (2015)
21.
Zurück zum Zitat Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: Deepercut: A deeper, stronger, and faster multi-person pose estimation model. arXiv preprint arXiv:1605.03170 (2016) Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: Deepercut: A deeper, stronger, and faster multi-person pose estimation model. arXiv preprint arXiv:​1605.​03170 (2016)
22.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
23.
Zurück zum Zitat Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2d human pose estimation: New benchmark and state of the art analysis. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2014 Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2d human pose estimation: New benchmark and state of the art analysis. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2014
24.
Zurück zum Zitat Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: Proceedings of the British Machine Vision Conference (2010). doi:10.5244/C.24.12 Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: Proceedings of the British Machine Vision Conference (2010). doi:10.​5244/​C.​24.​12
25.
Zurück zum Zitat Wang, H., Klaser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3169–3176, June 2011 Wang, H., Klaser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3169–3176, June 2011
26.
Zurück zum Zitat Wang, H., Ullah, M.M., Klser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: University of Central Florida, U.S.A. (2009) Wang, H., Ullah, M.M., Klser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: University of Central Florida, U.S.A. (2009)
27.
Zurück zum Zitat Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8, June 2008 Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8, June 2008
28.
Zurück zum Zitat Rodomagoulakis, I., Kardaris, N., Pitsikalis, V., Arvanitakis, A., Maragos, P.: A multimedia gesture dataset for human-robot communication: acquisition, tools and recognition results. In: IEEE International Conference on Image Processing (ICIP 2016), September 2016 Rodomagoulakis, I., Kardaris, N., Pitsikalis, V., Arvanitakis, A., Maragos, P.: A multimedia gesture dataset for human-robot communication: acquisition, tools and recognition results. In: IEEE International Conference on Image Processing (ICIP 2016), September 2016
Metadaten
Titel
Human Joint Angle Estimation and Gesture Recognition for Assistive Robotic Vision
verfasst von
Alp Guler
Nikolaos Kardaris
Siddhartha Chandra
Vassilis Pitsikalis
Christian Werner
Klaus Hauer
Costas Tzafestas
Petros Maragos
Iasonas Kokkinos
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-48881-3_29

Premium Partner