Skip to main content
Top

2015 | OriginalPaper | Chapter

uulmMAD – A Human Action Recognition Dataset for Ground-Truth Evaluation and Investigation of View Invariances

Authors : Michael Glodek, Georg Layher, Felix Heilemann, Florian Gawrilowicz, Günther Palm, Friedhelm Schwenker, Heiko Neumann

Published in: Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In recent time, human action recognition has gained increasing attention in pattern recognition. However, many datasets in the literature focus on a limited number of target-oriented properties. Within this work, we present a novel dataset, named uulmMAD, which has been created to benchmark state-of-the-art action recognition architectures addressing multiple properties, e.g. high-resolutions cameras, perspective changes, realistic cluttered background and noise, overlap of action classes, different execution speeds, variability in subjects and their clothing, and the availability of a pose ground-truth. The uulmMAD was recorded using three synchronized high-resolution cameras and an inertial motion capturing system. Each subject performed fourteen actions at least three times in front of a green screen. Selected actions in four variants were recorded, i.e. normal, pausing, fast and deceleration. The data has been post-processed in order to separate the subject from the background. Furthermore, the camera and the motion capturing data have been mapped onto each other and 3D-avatars have been generated to further extend the dataset. The avatars have also been used to emulate the self-occlusion in pose recognition when using a time-of-flight camera. In this work, we analyze the uulmMAD using a state-of-the-art action recognition architecture to provide first baseline results. The results emphasize the unique characteristics of the dataset. The dataset will be made publicity available upon publication of the paper.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Pike F-145 from Allied Vision with a Tevidon 1,8/16 lens.
 
2
Poser™ is a 3D modeling software for human avatars by Smith Micro Software.
 
Literature
1.
go back to reference Aggarwal, J., Ryoo, M.: Human activity analysis: a review. ACM Comput. Surv. 43(3), 16:1–16:43 (2011)CrossRef Aggarwal, J., Ryoo, M.: Human activity analysis: a review. ACM Comput. Surv. 43(3), 16:1–16:43 (2011)CrossRef
2.
go back to reference Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: Tenth IEEE International Conference on Computer Vision 2005, ICCV 2005, vol. 2, pp. 1395–1402. IEEE (2005) Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: Tenth IEEE International Conference on Computer Vision 2005, ICCV 2005, vol. 2, pp. 1395–1402. IEEE (2005)
3.
go back to reference Escobar, M.J., Masson, G.S., Vieville, T., Kornprobst, P.: Action recognition using a bio-inspired feedforward spiking network. Int. J. Comput. Vis. 82(3), 284–301 (2009)CrossRef Escobar, M.J., Masson, G.S., Vieville, T., Kornprobst, P.: Action recognition using a bio-inspired feedforward spiking network. Int. J. Comput. Vis. 82(3), 284–301 (2009)CrossRef
4.
go back to reference Glodek, M., Geier, T., Biundo, S., Palm, G.: A layered architecture for probabilistic complex pattern recognition to detect user preferences. J. Biol. Inspired Cogn. Archit. 9, 46–56 (2014) Glodek, M., Geier, T., Biundo, S., Palm, G.: A layered architecture for probabilistic complex pattern recognition to detect user preferences. J. Biol. Inspired Cogn. Archit. 9, 46–56 (2014)
5.
go back to reference Glodek, M., Geier, T., Biundo, S., Schwenker, F., Palm, G.: Recognizing user preferences based on layered activity recognition and first-order logic. In: Proceedings of the International IEEE Conference on Tools with Artificial Intelligence (ICTAI), pp. 648–653. IEEE (2013) Glodek, M., Geier, T., Biundo, S., Schwenker, F., Palm, G.: Recognizing user preferences based on layered activity recognition and first-order logic. In: Proceedings of the International IEEE Conference on Tools with Artificial Intelligence (ICTAI), pp. 648–653. IEEE (2013)
6.
go back to reference Glodek, M., Reuter, S., Schels, M., Dietmayer, K., Schwenker, F.: Kalman filter based classifier fusion for affective state recognition. In: Zhou, Z.-H., Roli, F., Kittler, J. (eds.) MCS 2013. LNCS, vol. 7872, pp. 85–94. Springer, Heidelberg (2013)CrossRef Glodek, M., Reuter, S., Schels, M., Dietmayer, K., Schwenker, F.: Kalman filter based classifier fusion for affective state recognition. In: Zhou, Z.-H., Roli, F., Kittler, J. (eds.) MCS 2013. LNCS, vol. 7872, pp. 85–94. Springer, Heidelberg (2013)CrossRef
7.
go back to reference Glodek, M., Schels, M., Schwenker, F., Palm, G.: Combination of sequential class distributions from multiple channels using Markov fusion networks. J. Multimodal User Interfaces 8(3), 257–272 (2014)CrossRef Glodek, M., Schels, M., Schwenker, F., Palm, G.: Combination of sequential class distributions from multiple channels using Markov fusion networks. J. Multimodal User Interfaces 8(3), 257–272 (2014)CrossRef
8.
go back to reference Glodek, M., Trentin, E., Schwenker, F., Palm, G.: Hidden Markov models with graph densities for action recognition. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), pp. 964–969. IEEE (2013) Glodek, M., Trentin, E., Schwenker, F., Palm, G.: Hidden Markov models with graph densities for action recognition. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), pp. 964–969. IEEE (2013)
9.
go back to reference Harris, C., Stephens, M.: A combined corner and edge detector. In: Proceedings of the Alvey Vision Conference, pp. 147–151 (1988) Harris, C., Stephens, M.: A combined corner and edge detector. In: Proceedings of the Alvey Vision Conference, pp. 147–151 (1988)
10.
go back to reference Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003) Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)
11.
go back to reference Hassner, T.: A critical review of action recognition benchmarks. In: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 245–250. IEEE Computer Society (2013) Hassner, T.: A critical review of action recognition benchmarks. In: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 245–250. IEEE Computer Society (2013)
12.
go back to reference Kächele, M., Schwenker, F.: Cascaded fusion of dynamic, spatial, and textural feature sets for person-independent facial emotion recognition. In: Proceedings of the International Conference on Pattern Recognition (ICPR), pp. 4660–4665. IEEE (2014) Kächele, M., Schwenker, F.: Cascaded fusion of dynamic, spatial, and textural feature sets for person-independent facial emotion recognition. In: Proceedings of the International Conference on Pattern Recognition (ICPR), pp. 4660–4665. IEEE (2014)
14.
go back to reference Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition 2008, CVPR 2008, pp. 1–8. IEEE (2008) Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition 2008, CVPR 2008, pp. 1–8. IEEE (2008)
15.
go back to reference Layher, G., Giese, M.A., Neumann, H.: Learning representations of animated motion sequences - a neural model. Top. Cogn. Sci. 6(1), 170–182 (2014)CrossRef Layher, G., Giese, M.A., Neumann, H.: Learning representations of animated motion sequences - a neural model. Top. Cogn. Sci. 6(1), 170–182 (2014)CrossRef
16.
go back to reference Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos “in the wild”. In: IEEE Conference on Computer Vision and Pattern Recognition 2009, CVPR 2009, pp. 1996–2003. IEEE (2009) Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos “in the wild”. In: IEEE Conference on Computer Vision and Pattern Recognition 2009, CVPR 2009, pp. 1996–2003. IEEE (2009)
17.
go back to reference Lv, F., Nevatia, R.: Single view human action recognition using key pose matching and viterbi path searching. In: IEEE Conference on Computer Vision and Pattern Recognition 2007, CVPR’07, pp. 1–8. IEEE (2007) Lv, F., Nevatia, R.: Single view human action recognition using key pose matching and viterbi path searching. In: IEEE Conference on Computer Vision and Pattern Recognition 2007, CVPR’07, pp. 1–8. IEEE (2007)
18.
go back to reference Mishima, Y.: A software chromakeyer using polyhedric slice. In: Proceedings of NICOGRAPH, vol. 92, pp. 44–52 (1992) Mishima, Y.: A software chromakeyer using polyhedric slice. In: Proceedings of NICOGRAPH, vol. 92, pp. 44–52 (1992)
19.
go back to reference Mishima, Y.: Soft edge chroma-key generation based upon hexoctahedral color space. U.S. Patent and Trademark Office, US Patent 5355174 A, Oct 1994 Mishima, Y.: Soft edge chroma-key generation based upon hexoctahedral color space. U.S. Patent and Trademark Office, US Patent 5355174 A, Oct 1994
20.
go back to reference Patron, A., Marszalek, M., Zisserman, A., Reid, I.: High five: recognising human interactions in TV shows. In: Proceedings of the British Machine Vision Conference, pp. 50.1–50.11. BMVA Press (2010). doi:10.5244/C.24.50 Patron, A., Marszalek, M., Zisserman, A., Reid, I.: High five: recognising human interactions in TV shows. In: Proceedings of the British Machine Vision Conference, pp. 50.1–50.11. BMVA Press (2010). doi:10.​5244/​C.​24.​50
21.
go back to reference Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28(6), 976–990 (2010)CrossRef Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28(6), 976–990 (2010)CrossRef
22.
go back to reference Gonzalez, R.C., Woods, R.E.: Digital Image Processing. Addison-Wesley, Reading (1993) Gonzalez, R.C., Woods, R.E.: Digital Image Processing. Addison-Wesley, Reading (1993)
23.
go back to reference Reddy, K.K., Shah, M.: Recognizing 50 human action categories of web videos. Mach. Vis. Appl. 24(5), 971–981 (2013)CrossRef Reddy, K.K., Shah, M.: Recognizing 50 human action categories of web videos. Mach. Vis. Appl. 24(5), 971–981 (2013)CrossRef
24.
go back to reference Roetenberg, D., Luinge, H., Slycke, P.: Xsens MVN: full 6DOF human motion tracking using miniature inertial sensors. Technical report, Xsens Technologies B. V. (2009) Roetenberg, D., Luinge, H., Slycke, P.: Xsens MVN: full 6DOF human motion tracking using miniature inertial sensors. Technical report, Xsens Technologies B. V. (2009)
25.
go back to reference Scherer, S., Glodek, M., Schwenker, F., Campbell, N., Palm, G.: Spotting laughter in natural multiparty conversations a comparison of automatic online and offline approaches using audiovisual data. ACM Trans. Interact. Intell. Syst. (TiiS) - Special Issue on Affective Interaction in Natural Environments 2(1), 4:1–4:31 (2012) Scherer, S., Glodek, M., Schwenker, F., Campbell, N., Palm, G.: Spotting laughter in natural multiparty conversations a comparison of automatic online and offline approaches using audiovisual data. ACM Trans. Interact. Intell. Syst. (TiiS) - Special Issue on Affective Interaction in Natural Environments 2(1), 4:1–4:31 (2012)
26.
go back to reference Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proceedings of the 17th International Conference on Pattern Recognition 2004, ICPR 2004, vol. 3, pp. 32–36. IEEE (2004) Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proceedings of the 17th International Conference on Pattern Recognition 2004, ICPR 2004, vol. 3, pp. 32–36. IEEE (2004)
27.
go back to reference Smith, A.R., Blinn, J.F.: Blue screen matting. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, pp. 259–268. ACM (1996) Smith, A.R., Blinn, J.F.: Blue screen matting. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, pp. 259–268. ACM (1996)
28.
go back to reference Tran, D., Sorokin, A.: Human activity recognition with metric learning. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 548–561. Springer, Heidelberg (2008)CrossRef Tran, D., Sorokin, A.: Human activity recognition with metric learning. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 548–561. Springer, Heidelberg (2008)CrossRef
Metadata
Title
uulmMAD – A Human Action Recognition Dataset for Ground-Truth Evaluation and Investigation of View Invariances
Authors
Michael Glodek
Georg Layher
Felix Heilemann
Florian Gawrilowicz
Günther Palm
Friedhelm Schwenker
Heiko Neumann
Copyright Year
2015
DOI
https://doi.org/10.1007/978-3-319-14899-1_8

Premium Partner