Skip to main content

2016 | OriginalPaper | Buchkapitel

Sympathy for the Details: Dense Trajectories and Hybrid Classification Architectures for Action Recognition

verfasst von : César Roberto de Souza, Adrien Gaidon, Eleonora Vig, Antonio Manuel López

Erschienen in: Computer Vision – ECCV 2016

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Action recognition in videos is a challenging task due to the complexity of the spatio-temporal patterns to model and the difficulty to acquire and learn on large quantities of video data. Deep learning, although a breakthrough for image classification and showing promise for videos, has still not clearly superseded action recognition methods using hand-crafted features, even when training on massive datasets. In this paper, we introduce hybrid video classification architectures based on carefully designed unsupervised representations of hand-crafted spatio-temporal features classified by supervised deep networks. As we show in our experiments on five popular benchmarks for action recognition, our hybrid model combines the best of both worlds: it is data efficient (trained on 150 to 10000 short clips) and yet improves significantly on the state of the art, including recent deep models trained on millions of manually labelled images and videos.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Vrigkas, M., Nikou, C., Kakadiaris, I.: A review of human activity recognition methods. Front. Robot. AI 2, 1–28 (2015)CrossRef Vrigkas, M., Nikou, C., Kakadiaris, I.: A review of human activity recognition methods. Front. Robot. AI 2, 1–28 (2015)CrossRef
2.
Zurück zum Zitat Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR (2011) Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR (2011)
3.
Zurück zum Zitat Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV (2013) Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV (2013)
4.
Zurück zum Zitat Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. IJCV 103, 60–79 (2013)MathSciNetCrossRef Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. IJCV 103, 60–79 (2013)MathSciNetCrossRef
6.
Zurück zum Zitat Gaidon, A., Harchaoui, Z., Schmid, C.: Activity representation with motion hierarchies. IJCV 107, 219–238 (2014)MathSciNetCrossRef Gaidon, A., Harchaoui, Z., Schmid, C.: Activity representation with motion hierarchies. IJCV 107, 219–238 (2014)MathSciNetCrossRef
7.
Zurück zum Zitat Lan, Z., Lin, M., Li, X., Hauptmann, A.G., Raj, B.: Beyond Gaussian pyramid: multi-skip feature stacking for action recognition. In: CVPR (2015) Lan, Z., Lin, M., Li, X., Hauptmann, A.G., Raj, B.: Beyond Gaussian pyramid: multi-skip feature stacking for action recognition. In: CVPR (2015)
8.
Zurück zum Zitat Peng, X., Wang, L., Wang, X., Qiao, Y.: Bag of visual words and fusion methods for action recognition: comprehensive study and good practice. arXiv:1405.4506, May 2014 Peng, X., Wang, L., Wang, X., Qiao, Y.: Bag of visual words and fusion methods for action recognition: comprehensive study and good practice. arXiv:​1405.​4506, May 2014
9.
Zurück zum Zitat Hoai, M., Zisserman, A.: Improving human action recognition using score distribution and ranking. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9007, pp. 3–20. Springer, Heidelberg (2015). doi:10.1007/978-3-319-16814-2_1 Hoai, M., Zisserman, A.: Improving human action recognition using score distribution and ranking. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9007, pp. 3–20. Springer, Heidelberg (2015). doi:10.​1007/​978-3-319-16814-2_​1
10.
Zurück zum Zitat Fernando, B., Gavves, E., Oramas, M., Ghodrati, A., Tuytelaars, T., Leuven, K.U.: Modeling video evolution for action recognition. In: CVPR (2015) Fernando, B., Gavves, E., Oramas, M., Ghodrati, A., Tuytelaars, T., Leuven, K.U.: Modeling video evolution for action recognition. In: CVPR (2015)
11.
Zurück zum Zitat Ji, S., Yang, M., Yu, K., Xu, W.: 3D convolutional neural networks for human action recognition. T-PAMI 35, 221–31 (2013)CrossRef Ji, S., Yang, M., Yu, K., Xu, W.: 3D convolutional neural networks for human action recognition. T-PAMI 35, 221–31 (2013)CrossRef
12.
Zurück zum Zitat Karpathy, A., Leung, T.: Large-scale video classification with convolutional neural networks. In: CVPR (2014) Karpathy, A., Leung, T.: Large-scale video classification with convolutional neural networks. In: CVPR (2014)
13.
Zurück zum Zitat Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS (2014) Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS (2014)
14.
Zurück zum Zitat Sun, L., Jia, K., Yeung, D.Y., Shi, B.E.: Human action recognition using factorized spatio-temporal convolutional networks. In: ICCV (2015) Sun, L., Jia, K., Yeung, D.Y., Shi, B.E.: Human action recognition using factorized spatio-temporal convolutional networks. In: ICCV (2015)
15.
Zurück zum Zitat Wu, Z., Jiang, Y.g., Wang, X., Ye, H., Xue, X., Wang, J.: Fusing multi-stream deep networks for video classification. arXiv:1509.06086, September 2015 Wu, Z., Jiang, Y.g., Wang, X., Ye, H., Xue, X., Wang, J.: Fusing multi-stream deep networks for video classification. arXiv:​1509.​06086, September 2015
16.
Zurück zum Zitat Donahue, J., Hendricks, L.A., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: CVPR (2015) Donahue, J., Hendricks, L.A., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: CVPR (2015)
17.
Zurück zum Zitat Ng, J.Y.H., Hausknecht, M.J., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: deep networks for video classification. In: CVPR (2015) Ng, J.Y.H., Hausknecht, M.J., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: deep networks for video classification. In: CVPR (2015)
18.
Zurück zum Zitat Srivastava, N., Mansimov, E., Salakhutdinov, R.: Unsupervised learning of video representations using LSTMs. arXiv:1502.04681, March 2015 Srivastava, N., Mansimov, E., Salakhutdinov, R.: Unsupervised learning of video representations using LSTMs. arXiv:​1502.​04681, March 2015
19.
Zurück zum Zitat Ballas, N., Yao, L., Pal, C., Courville, A.C.: Delving deeper into convolutional networks for learning video representations. In: ICLR (2013) Ballas, N., Yao, L., Pal, C., Courville, A.C.: Delving deeper into convolutional networks for learning video representations. In: ICLR (2013)
20.
Zurück zum Zitat Gan, C., Wang, N., Yang, Y., Yeung, D.Y., Hauptmann, A.G.: DevNet: A Deep Event Network for multimedia event detection and evidence recounting. In: CVPR (2015) Gan, C., Wang, N., Yang, Y., Yeung, D.Y., Hauptmann, A.G.: DevNet: A Deep Event Network for multimedia event detection and evidence recounting. In: CVPR (2015)
21.
Zurück zum Zitat Wang, X., Farhadi, A., Gupta, A.: Actions \(\sim \) transformations. In: CVPR (2015) Wang, X., Farhadi, A., Gupta, A.: Actions \(\sim \) transformations. In: CVPR (2015)
22.
Zurück zum Zitat Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: CVPR (2016) Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: CVPR (2016)
23.
Zurück zum Zitat Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: CVPR (2014) Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: CVPR (2014)
24.
Zurück zum Zitat Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: CVPR (2007) Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: CVPR (2007)
25.
Zurück zum Zitat Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher Kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15561-1_11 CrossRef Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher Kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010). doi:10.​1007/​978-3-642-15561-1_​11 CrossRef
26.
Zurück zum Zitat Wang, L., Qiao, Y., Tang, X.: Action recognition with trajectory-pooled deep-convolutional descriptors. In: CVPR (2015) Wang, L., Qiao, Y., Tang, X.: Action recognition with trajectory-pooled deep-convolutional descriptors. In: CVPR (2015)
27.
Zurück zum Zitat Perronnin, F., Larlus, D.: Fisher vectors meet neural networks: a hybrid classification architecture. In: CVPR (2015) Perronnin, F., Larlus, D.: Fisher vectors meet neural networks: a hybrid classification architecture. In: CVPR (2015)
28.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)
29.
Zurück zum Zitat Arandjelovi, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: CVPR (2012) Arandjelovi, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: CVPR (2012)
30.
Zurück zum Zitat Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015) Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)
31.
Zurück zum Zitat Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: ICML (2010) Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: ICML (2010)
32.
Zurück zum Zitat Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)MathSciNetMATH Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)MathSciNetMATH
33.
Zurück zum Zitat Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., Baskurt, A.: Action classification in soccer videos with long short-term memory recurrent neural networks. In: Diamantaras, K., Duch, W., Iliadis, L.S. (eds.) ICANN 2010. LNCS, vol. 6353, pp. 154–159. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15822-3_20 CrossRef Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., Baskurt, A.: Action classification in soccer videos with long short-term memory recurrent neural networks. In: Diamantaras, K., Duch, W., Iliadis, L.S. (eds.) ICANN 2010. LNCS, vol. 6353, pp. 154–159. Springer, Heidelberg (2010). doi:10.​1007/​978-3-642-15822-3_​20 CrossRef
34.
Zurück zum Zitat Zha, S., Luisier, F., Andrews, W., Srivastava, N., Salakhutdinov, R.: Exploiting image-trained CNN architectures for unconstrained video classification. In: BMVC (2015) Zha, S., Luisier, F., Andrews, W., Srivastava, N., Salakhutdinov, R.: Exploiting image-trained CNN architectures for unconstrained video classification. In: BMVC (2015)
35.
Zurück zum Zitat Wu, Z., Wang, X., Jiang, Y., Ye, H., Xue, X.: Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In: ACM MM (2015) Wu, Z., Wang, X., Jiang, Y., Ye, H., Xue, X.: Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In: ACM MM (2015)
36.
Zurück zum Zitat Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005) Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
37.
Zurück zum Zitat Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006). doi:10.1007/11744047_33 CrossRef Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006). doi:10.​1007/​11744047_​33 CrossRef
38.
Zurück zum Zitat Gaidon, A., Harchaoui, Z., Schmid, C.: Recognizing activities with cluster-trees of tracklets. In: BMVC (2012) Gaidon, A., Harchaoui, Z., Schmid, C.: Recognizing activities with cluster-trees of tracklets. In: BMVC (2012)
39.
Zurück zum Zitat Jain, M., Jegou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: CVPR (2013) Jain, M., Jegou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: CVPR (2013)
40.
Zurück zum Zitat Peng, X., Zou, C., Qiao, Y., Peng, Q.: Action recognition with stacked Fisher vectors. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 581–595. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10602-1_38 Peng, X., Zou, C., Qiao, Y., Peng, Q.: Action recognition with stacked Fisher vectors. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 581–595. Springer, Heidelberg (2014). doi:10.​1007/​978-3-319-10602-1_​38
41.
Zurück zum Zitat Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR (2008) Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR (2008)
42.
Zurück zum Zitat Krapac, J., Verbeek, J., Jurie, F.: Modeling spatial layout with Fisher vectors for image categorization. In: ICCV (2011) Krapac, J., Verbeek, J., Jurie, F.: Modeling spatial layout with Fisher vectors for image categorization. In: ICCV (2011)
43.
Zurück zum Zitat Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)MathSciNetCrossRef Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)MathSciNetCrossRef
44.
Zurück zum Zitat LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., Jackel, L.: Handwritten digit recognition with a back-propagation network. In: NIPS (1989) LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., Jackel, L.: Handwritten digit recognition with a back-propagation network. In: NIPS (1989)
45.
Zurück zum Zitat Le, Q.V., Zou, W.Y., Yeung, S.Y., Ng, A.Y.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: CVPR (2011) Le, Q.V., Zou, W.Y., Yeung, S.Y., Ng, A.Y.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: CVPR (2011)
46.
Zurück zum Zitat Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: BMVC (2011) Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: BMVC (2011)
47.
Zurück zum Zitat Wang, H., Schmid, C.: Lear-inria submission for the thumos workshop. In: ICCV workshop on action recognition with a large number of classes (2013) Wang, H., Schmid, C.: Lear-inria submission for the thumos workshop. In: ICCV workshop on action recognition with a large number of classes (2013)
48.
Zurück zum Zitat Sanchez, J., Perronnin, F., De Campos, T.: Modeling the spatial layout of images beyond spatial pyramids. Pattern Recogn. Lett. 33(16), 2216–2223 (2012)CrossRef Sanchez, J., Perronnin, F., De Campos, T.: Modeling the spatial layout of images beyond spatial pyramids. Pattern Recogn. Lett. 33(16), 2216–2223 (2012)CrossRef
50.
Zurück zum Zitat Poggio, T., Smale, S.: The mathematics of learning: dealing with data. Not. Am. Math. Soc. 50, 537–544 (2003)MathSciNetMATH Poggio, T., Smale, S.: The mathematics of learning: dealing with data. Not. Am. Math. Soc. 50, 537–544 (2003)MathSciNetMATH
51.
Zurück zum Zitat Bousquet, O., Elisseeff, A.: Stability and generalization. J. Mach. Learn. Res. 2, 499–526 (2002)MathSciNetMATH Bousquet, O., Elisseeff, A.: Stability and generalization. J. Mach. Learn. Res. 2, 499–526 (2002)MathSciNetMATH
52.
Zurück zum Zitat Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: BMVC (2014) Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: BMVC (2014)
53.
Zurück zum Zitat Konda, K., Bouthillier, X., Memisevic, R., Vincent, P.: Dropout as data augmentation. In: ICLR (2015) Konda, K., Bouthillier, X., Memisevic, R., Vincent, P.: Dropout as data augmentation. In: ICLR (2015)
54.
Zurück zum Zitat Paulin, M., Harchaoui, Z., Perronnin, F., Paulin, M.: Transformation pursuit for image classification. In: CVPR (2014) Paulin, M., Harchaoui, Z., Perronnin, F., Paulin, M.: Transformation pursuit for image classification. In: CVPR (2014)
55.
Zurück zum Zitat Boureau, Y.L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: ICML (2010) Boureau, Y.L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: ICML (2010)
57.
Zurück zum Zitat Chollet, F.: Keras: Deep learning library for Theano and TensorFlow (2015) Chollet, F.: Keras: Deep learning library for Theano and TensorFlow (2015)
58.
Zurück zum Zitat Maclin, R., Opitz, D.: An empirical evaluation of bagging and boosting. In: AAAI (1997) Maclin, R., Opitz, D.: An empirical evaluation of bagging and boosting. In: AAAI (1997)
59.
Zurück zum Zitat Zhou, Z.H.H., Wu, J., Tang, W.: Ensembling neural networks: many could be better than all. Artif. Intell. 137, 239–263 (2002)MathSciNetCrossRefMATH Zhou, Z.H.H., Wu, J., Tang, W.: Ensembling neural networks: many could be better than all. Artif. Intell. 137, 239–263 (2002)MathSciNetCrossRefMATH
60.
Zurück zum Zitat Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR (2009) Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR (2009)
61.
Zurück zum Zitat Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: ICCV (2011) Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: ICCV (2011)
62.
Zurück zum Zitat Soomro, K., Zamir, A.R., Shah, M.: UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv:1212.0402, December 2012 Soomro, K., Zamir, A.R., Shah, M.: UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv:​1212.​0402, December 2012
63.
Zurück zum Zitat Jiang, Y.G., Liu, J., Zamir, R., a., Laptev, I., Piccardi, M., Shah, M., Sukthankar, R.: THUMOS Challenge: Action Recognition with a Large Number of Classes (2013) Jiang, Y.G., Liu, J., Zamir, R., a., Laptev, I., Piccardi, M., Shah, M., Sukthankar, R.: THUMOS Challenge: Action Recognition with a Large Number of Classes (2013)
64.
Zurück zum Zitat Niebles, J.C., Chen, C.-W., Fei-Fei, L.: Modeling temporal structure of decomposable motion segments for activity classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6312, pp. 392–405. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15552-9_29 CrossRef Niebles, J.C., Chen, C.-W., Fei-Fei, L.: Modeling temporal structure of decomposable motion segments for activity classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6312, pp. 392–405. Springer, Heidelberg (2010). doi:10.​1007/​978-3-642-15552-9_​29 CrossRef
65.
Zurück zum Zitat Patron-Perez, A., Marszalek, M., Zisserman, A., Reid, I.: High five: recognising human interactions in TV shows. In: BMVC (2010) Patron-Perez, A., Marszalek, M., Zisserman, A., Reid, I.: High five: recognising human interactions in TV shows. In: BMVC (2010)
66.
Zurück zum Zitat Patron-Perez, A., Marszalek, M., Reid, I., Zisserman, A.: Structured learning of human interaction in TV shows. T-PAMI 34, 2441–2453 (2012)CrossRef Patron-Perez, A., Marszalek, M., Reid, I., Zisserman, A.: Structured learning of human interaction in TV shows. T-PAMI 34, 2441–2453 (2012)CrossRef
67.
Zurück zum Zitat Jain, M., Van Gemert, J., Snoek, C.G.M.: What do 15,000 object categories tell us about classifying and localizing actions? In: CVPR (2015) Jain, M., Van Gemert, J., Snoek, C.G.M.: What do 15,000 object categories tell us about classifying and localizing actions? In: CVPR (2015)
Metadaten
Titel
Sympathy for the Details: Dense Trajectories and Hybrid Classification Architectures for Action Recognition
verfasst von
César Roberto de Souza
Adrien Gaidon
Eleonora Vig
Antonio Manuel López
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-46478-7_43