Skip to main content

2016 | OriginalPaper | Buchkapitel

Tensor Representations via Kernel Linearization for Action Recognition from 3D Skeletons

verfasst von : Piotr Koniusz, Anoop Cherian, Fatih Porikli

Erschienen in: Computer Vision – ECCV 2016

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we explore tensor representations that can compactly capture higher-order relationships between skeleton joints for 3D action recognition. We first define RBF kernels on 3D joint sequences, which are then linearized to form kernel descriptors. The higher-order outer-products of these kernel descriptors form our tensor representations. We present two different kernels for action recognition, namely (i) a sequence compatibility kernel that captures the spatio-temporal compatibility of joints in one sequence against those in the other, and (ii) a dynamics compatibility kernel that explicitly models the action dynamics of a sequence. Tensors formed from these kernels are then used to train an SVM. We present experiments on several benchmark datasets and demonstrate state of the art results, substantiating the effectiveness of our representations.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
We assume that all sequences have N frames for simplification of presentation. Our formulations are equally applicable to sequences of arbitrary lengths e.g.M and N. Therefore, we apply in practice \(G_{\sigma _3}(\frac{s}{M}-\frac{t}{N})\) in Eq. (5).
 
2
In practice, we use \(G^{'}_{\sigma _2}(\mathbf {x}-\mathbf {y})=G_{\sigma _2}(x^{(x)}-y^{(x)})\!+\!G_{\sigma _2}(x^{(y)}-y^{(y)})\!+\!G_{\sigma _2}(x^{(z)}-y^{(z)})\) so the kernel \(G^{'}_{\sigma _2}(\mathbf {x}-\mathbf {y})\approx [\phi (x^{(x)}\!); \phi (x^{(y)}\!); \phi (x^{(z)}\!)]^T\![\phi (y^{(x)}\!); \phi (y^{(y)}\!); \phi (y^{(z)}\!)]\) but for simplicity we write \(G_{\sigma _2}(\mathbf {x}-\mathbf {y})\!\approx \!\phi (\mathbf {x})^T\phi (\mathbf {y})\). Note that (x), (y), (z) are the spatial xyz-components of joints.
 
3
Note that this is the length of a vector per sequence after unfolding our tensor representation and removing duplicate coefficients from the symmetries in the tensor.
 
Literatur
1.
Zurück zum Zitat Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finocchio, M., Blake, A., Cook, M., Moore, R.: Real-time human pose recognition in parts from single depth images. Commun. ACM 56, 116–124 (2013)CrossRef Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finocchio, M., Blake, A., Cook, M., Moore, R.: Real-time human pose recognition in parts from single depth images. Commun. ACM 56, 116–124 (2013)CrossRef
2.
Zurück zum Zitat Turaga, P., Chellappa, R.: Locally time-invariant models of human activities using trajectories on the grassmannian. In: CVPR (2009) Turaga, P., Chellappa, R.: Locally time-invariant models of human activities using trajectories on the grassmannian. In: CVPR (2009)
3.
Zurück zum Zitat Presti, L.L., La Cascia, M.: 3D skeleton-based human action classification: a survey. Pattern Recogn. 53, 130–147 (2015)CrossRef Presti, L.L., La Cascia, M.: 3D skeleton-based human action classification: a survey. Pattern Recogn. 53, 130–147 (2015)CrossRef
4.
Zurück zum Zitat Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3D skeletons as points in a Lie Group. In: CVPR, pp. 588–595 (2014) Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3D skeletons as points in a Lie Group. In: CVPR, pp. 588–595 (2014)
5.
Zurück zum Zitat Harandi, M., Salzmann, M., Porikli, F.: Bregman divergences for infinite dimensional covariance matrices. In: CVPR (2014) Harandi, M., Salzmann, M., Porikli, F.: Bregman divergences for infinite dimensional covariance matrices. In: CVPR (2014)
6.
Zurück zum Zitat Hussein, M.E., Torki, M., Gowayyed, M.A., El-Saban, M.: Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations. In: IJCAI (2013) Hussein, M.E., Torki, M., Gowayyed, M.A., El-Saban, M.: Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations. In: IJCAI (2013)
7.
Zurück zum Zitat Elgammal, A., Lee, C.S.: Tracking people on a torus. PAMI 31, 520–538 (2009)CrossRef Elgammal, A., Lee, C.S.: Tracking people on a torus. PAMI 31, 520–538 (2009)CrossRef
8.
Zurück zum Zitat Li, B., Camps, O.I., Sznaier, M.: Cross-view activity recognition using hankelets. In: CVPR (2012) Li, B., Camps, O.I., Sznaier, M.: Cross-view activity recognition using hankelets. In: CVPR (2012)
9.
Zurück zum Zitat Xia, L., Chen, C.C., Aggarwal, J.K.: View invariant human action recognition using histograms of 3D joints. In: CVPR Workshops, pp. 20–27(2012) Xia, L., Chen, C.C., Aggarwal, J.K.: View invariant human action recognition using histograms of 3D joints. In: CVPR Workshops, pp. 20–27(2012)
10.
Zurück zum Zitat Seidenari, L., Varano, V., Berretti, S., Bimbo, A.D., Pala, P.: Recognizing actions from depth cameras as weakly aligned multi-part bag-of-poses. In: CVPR Workshop, June 2013 Seidenari, L., Varano, V., Berretti, S., Bimbo, A.D., Pala, P.: Recognizing actions from depth cameras as weakly aligned multi-part bag-of-poses. In: CVPR Workshop, June 2013
11.
Zurück zum Zitat Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3D points. In: CVPR Workshop, pp. 9–14 (2010) Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3D points. In: CVPR Workshop, pp. 9–14 (2010)
12.
Zurück zum Zitat Zatsiorsky, V.M.: Kinematic of Human Motion. Human Kinetics Publishers, Champaign (1997) Zatsiorsky, V.M.: Kinematic of Human Motion. Human Kinetics Publishers, Champaign (1997)
13.
Zurück zum Zitat Johansson, G.: Visual perception of biological motion and a model for its analysis. Percept. Psychophysics 14(2), 201–211 (1973)CrossRef Johansson, G.: Visual perception of biological motion and a model for its analysis. Percept. Psychophysics 14(2), 201–211 (1973)CrossRef
14.
Zurück zum Zitat Hussein, M.E., Torki, M., Gowayyed, M., El-Saban, M.: Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations. In: IJCAI 2466–2472 (2013) Hussein, M.E., Torki, M., Gowayyed, M., El-Saban, M.: Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations. In: IJCAI 2466–2472 (2013)
15.
Zurück zum Zitat Lv, F., Nevatia, R.: Recognition and segmentation of 3-D human action using HMM and multi-class AdaBoost. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 359–372. Springer, Heidelberg (2006). doi:10.1007/11744085_28 CrossRef Lv, F., Nevatia, R.: Recognition and segmentation of 3-D human action using HMM and multi-class AdaBoost. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 359–372. Springer, Heidelberg (2006). doi:10.​1007/​11744085_​28 CrossRef
16.
Zurück zum Zitat Parameswaran, V., Chellappa, R.: View invariance for human action recognition. IJCV 66(1), 83–101 (2006)CrossRef Parameswaran, V., Chellappa, R.: View invariance for human action recognition. IJCV 66(1), 83–101 (2006)CrossRef
17.
Zurück zum Zitat Wu, Y., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: CVPR, pp. 1290–1297 (2012) Wu, Y., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: CVPR, pp. 1290–1297 (2012)
18.
Zurück zum Zitat Yang, X., Tian, Y.: Effective 3D action recognition using eigenjoints. J. Vis. Comun. Image Represent. 25(1), 2–11 (2014)MathSciNetCrossRef Yang, X., Tian, Y.: Effective 3D action recognition using eigenjoints. J. Vis. Comun. Image Represent. 25(1), 2–11 (2014)MathSciNetCrossRef
19.
Zurück zum Zitat Yacoob, Y., Black, M.J.: Parameterized modeling and recognition of activities. In: ICCV, pp. 120–128 (1998) Yacoob, Y., Black, M.J.: Parameterized modeling and recognition of activities. In: ICCV, pp. 120–128 (1998)
20.
Zurück zum Zitat Ohn-Bar, E., Trivedi, M.M.: Joint angles similarities and HOG\(^2\) for action recognition. In: CVPR Workshop (2013) Ohn-Bar, E., Trivedi, M.M.: Joint angles similarities and HOG\(^2\) for action recognition. In: CVPR Workshop (2013)
21.
Zurück zum Zitat Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., Bajcsy, R.: Sequence of the most informative joints (SMIJ). J. Vis. Comun. Image Represent. 25(1), 24–38 (2014)CrossRef Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., Bajcsy, R.: Sequence of the most informative joints (SMIJ). J. Vis. Comun. Image Represent. 25(1), 24–38 (2014)CrossRef
22.
Zurück zum Zitat Bo, L., Lai, K., Ren, X., Fox, D.: Object recognition with hierarchical kernel descriptors. In: CVPR (2011) Bo, L., Lai, K., Ren, X., Fox, D.: Object recognition with hierarchical kernel descriptors. In: CVPR (2011)
23.
Zurück zum Zitat Mairal, J., Koniusz, P., Harchaoui, Z., Schmid, C.: Convolutional kernel networks. In: NIPS (2014) Mairal, J., Koniusz, P., Harchaoui, Z., Schmid, C.: Convolutional kernel networks. In: NIPS (2014)
24.
Zurück zum Zitat Cavazza, J., Zunino, A., Biagio, M.S., Vittorio, M.: Kernelized covariance for action recognition. CoRR abs/1604.06582 (2016) Cavazza, J., Zunino, A., Biagio, M.S., Vittorio, M.: Kernelized covariance for action recognition. CoRR abs/1604.06582 (2016)
25.
Zurück zum Zitat Gaidon, A., Harchoui, Z., Schmid, C.: A time series kernel for action recognition. BMVC 63(1-63), 11 (2011) Gaidon, A., Harchoui, Z., Schmid, C.: A time series kernel for action recognition. BMVC 63(1-63), 11 (2011)
26.
Zurück zum Zitat Kim, T.K., Wong, K.Y.K., Cipolla, R.: Tensor canonical correlation analysis for action classification. In: CVPR (2007) Kim, T.K., Wong, K.Y.K., Cipolla, R.: Tensor canonical correlation analysis for action classification. In: CVPR (2007)
27.
Zurück zum Zitat Shashua, A., Hazan, T.: Non-negative tensor factorization with applications to statistics and computer vision. In: ICML (2005) Shashua, A., Hazan, T.: Non-negative tensor factorization with applications to statistics and computer vision. In: ICML (2005)
28.
Zurück zum Zitat Vasilescu, M.A., Terzopoulos, D.: Tensortextures: multilinear image-based rendering. ACM Trans. Graph. 23(3), 336–342 (2004)CrossRef Vasilescu, M.A., Terzopoulos, D.: Tensortextures: multilinear image-based rendering. ACM Trans. Graph. 23(3), 336–342 (2004)CrossRef
29.
Zurück zum Zitat Vasilescu, M.A.O., Terzopoulos, D.: Multilinear analysis of image ensembles: tensorfaces. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 447–460. Springer, Heidelberg (2002). doi:10.1007/3-540-47969-4_30 CrossRef Vasilescu, M.A.O., Terzopoulos, D.: Multilinear analysis of image ensembles: tensorfaces. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 447–460. Springer, Heidelberg (2002). doi:10.​1007/​3-540-47969-4_​30 CrossRef
30.
Zurück zum Zitat Lu, H., Plataniotis, K.N., Venetsanopoulos, A.N.: A survey of multilinear subspace learning for tensor data. Pattern Recogn. 44(7), 1540–1551 (2011)CrossRefMATH Lu, H., Plataniotis, K.N., Venetsanopoulos, A.N.: A survey of multilinear subspace learning for tensor data. Pattern Recogn. 44(7), 1540–1551 (2011)CrossRefMATH
31.
Zurück zum Zitat Koniusz, P., Yan, F., Gosselin, P., Mikolajczyk, K.: Higher-order occurrence pooling on mid- and low-level features: visual concept detection. Technical report (2013) Koniusz, P., Yan, F., Gosselin, P., Mikolajczyk, K.: Higher-order occurrence pooling on mid- and low-level features: visual concept detection. Technical report (2013)
32.
Zurück zum Zitat Koniusz, P., Yan, F., Gosselin, P., Mikolajczyk, K.: Higher-order occurrence pooling for bags-of-words: visual concept detection. PAMI (2016) Koniusz, P., Yan, F., Gosselin, P., Mikolajczyk, K.: Higher-order occurrence pooling for bags-of-words: visual concept detection. PAMI (2016)
33.
Zurück zum Zitat Koniusz, P., Cherian, A.: Sparse coding for third-order super-symmetric tensor descriptors with application to texture recognition. In: CVPR (2016) Koniusz, P., Cherian, A.: Sparse coding for third-order super-symmetric tensor descriptors with application to texture recognition. In: CVPR (2016)
34.
Zurück zum Zitat Zhao, X., Wang, S., Li, S., Li, J.: A comprehensive study on third order statistical features for image splicing detection. In: Digital Forensics and Watermarking, pp. 243–256 (2012) Zhao, X., Wang, S., Li, S., Li, J.: A comprehensive study on third order statistical features for image splicing detection. In: Digital Forensics and Watermarking, pp. 243–256 (2012)
35.
36.
Zurück zum Zitat Jégou, H., Douze, M., Schmid, C.: On the burstiness of visual elements. In: CVPR, pp. 1169–1176(2009) Jégou, H., Douze, M., Schmid, C.: On the burstiness of visual elements. In: CVPR, pp. 1169–1176(2009)
37.
Zurück zum Zitat Koniusz, P., Cherian, A., Porikli, F.: Tensor representations via kernel linearization for action recognition from 3D skeletons (extended version). CoRR abs/1604.00239 (2016) Koniusz, P., Cherian, A., Porikli, F.: Tensor representations via kernel linearization for action recognition from 3D skeletons (extended version). CoRR abs/1604.00239 (2016)
38.
Zurück zum Zitat Zhu, Y., Chen, W., Guo, G.: Fusing spatiotemporal features and joints for 3D action recognition. In: CVPR Workshop, pp. 486–491(2013) Zhu, Y., Chen, W., Guo, G.: Fusing spatiotemporal features and joints for 3D action recognition. In: CVPR Workshop, pp. 486–491(2013)
39.
Zurück zum Zitat Zanfir, M., Leordeanu, M., Sminchisescu, C.: The moving pose: An efficient 3D kinematics descriptor for low-latency action recognition and detection. In: ICCV, pp. 2752–2759 (2013) Zanfir, M., Leordeanu, M., Sminchisescu, C.: The moving pose: An efficient 3D kinematics descriptor for low-latency action recognition and detection. In: ICCV, pp. 2752–2759 (2013)
Metadaten
Titel
Tensor Representations via Kernel Linearization for Action Recognition from 3D Skeletons
verfasst von
Piotr Koniusz
Anoop Cherian
Fatih Porikli
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-46493-0_3