Skip to main content
Erschienen in: International Journal of Computer Vision 1-2/2014

01.08.2014

Harnessing Lab Knowledge for Real-World Action Recognition

verfasst von: Zhigang Ma, Yi Yang, Feiping Nie, Nicu Sebe, Shuicheng Yan, Alexander G. Hauptmann

Erschienen in: International Journal of Computer Vision | Ausgabe 1-2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Much research on human action recognition has been oriented toward the performance gain on lab-collected datasets. Yet real-world videos are more diverse, with more complicated actions and often only a few of them are precisely labeled. Thus, recognizing actions from these videos is a tough mission. The paucity of labeled real-world videos motivates us to “borrow” strength from other resources. Specifically, considering that many lab datasets are available, we propose to harness lab datasets to facilitate the action recognition in real-world videos given that the lab and real-world datasets are related. As their action categories are usually inconsistent, we design a multi-task learning framework to jointly optimize the classifiers for both sides. The general Schatten \(p\)-norm is exerted on the two classifiers to explore the shared knowledge between them. In this way, our framework is able to mine the shared knowledge between two datasets even if the two have different action categories, which is a major virtue of our method. The shared knowledge is further used to improve the action recognition in the real-world videos. Extensive experiments are performed on real-world datasets with promising results.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
Equation (7) is non-differentiable in the neighborhood of the optimum. Hence, in the implementation, we can define \(D\) as \(D=\frac{p}{2}(PP^T + \varsigma I)^{\frac{{p - 2}}{2}}\) where \(\varsigma \) is a small constant and \(I\) is a diagonal matrix.
 
Literatur
Zurück zum Zitat Argyriou, A., Evgeniou, T., & Pontil, M. (2008). Convex multi-task feature learning. Machine Learning Research, 73(3), 243–272.CrossRef Argyriou, A., Evgeniou, T., & Pontil, M. (2008). Convex multi-task feature learning. Machine Learning Research, 73(3), 243–272.CrossRef
Zurück zum Zitat Argyriou, A., Micchelli, C. A., Pontil, M., & Ying, Y. (2010). A spectral regularization framework for multi-task structure learning. Journal of Machine Learning Research, 11, 935–953.MATH Argyriou, A., Micchelli, C. A., Pontil, M., & Ying, Y. (2010). A spectral regularization framework for multi-task structure learning. Journal of Machine Learning Research, 11, 935–953.MATH
Zurück zum Zitat Aytar, Y., & Zisserman, A. (2011). Tabula rasa: Model transfer for object category detection. In International conference on computer vision (pp. 2252–2259). Aytar, Y., & Zisserman, A. (2011). Tabula rasa: Model transfer for object category detection. In International conference on computer vision (pp. 2252–2259).
Zurück zum Zitat Cao, L., Liu, Z., & Huang, T. S. (2010). Cross-dataset action detection. In IEEE conference on computer vision and pattern recognition (pp. 1998–2005). Cao, L., Liu, Z., & Huang, T. S. (2010). Cross-dataset action detection. In IEEE conference on computer vision and pattern recognition (pp. 1998–2005).
Zurück zum Zitat Chen, C., Zhuang, Y., Nie, F., Yang, Y., Wu, F., & Xiao, J. (2011). Learning a 3D human pose distance metric from geometric pose descriptor. IEEE Transactions on Visualization and Computer Graphics, 17(11), 1676–1689.CrossRef Chen, C., Zhuang, Y., Nie, F., Yang, Y., Wu, F., & Xiao, J. (2011). Learning a 3D human pose distance metric from geometric pose descriptor. IEEE Transactions on Visualization and Computer Graphics, 17(11), 1676–1689.CrossRef
Zurück zum Zitat Chen, M.-Y., & Hauptmann, A. (2009). Mosift: Recognizing human actions in surveillance videos. In Technical Report CMU-CS-09-161, Carnegie Mellon University. Chen, M.-Y., & Hauptmann, A. (2009). Mosift: Recognizing human actions in surveillance videos. In Technical Report CMU-CS-09-161, Carnegie Mellon University.
Zurück zum Zitat Deselaers, T., Alexe, B., & Ferrari, V. (2012). Weakly supervised localization and learning with generic knowledge. International Journal of Computer Vision, 100(3), 275–293.CrossRefMathSciNet Deselaers, T., Alexe, B., & Ferrari, V. (2012). Weakly supervised localization and learning with generic knowledge. International Journal of Computer Vision, 100(3), 275–293.CrossRefMathSciNet
Zurück zum Zitat Duan, L., Xu, D., Tsang, I. W.-H., & Luo, J. (2012). Visual event recognition in videos by learning from web data. IEEE Transactions Pattern Analysis and Machine Intelligence, 34(9), 1667–1680.CrossRef Duan, L., Xu, D., Tsang, I. W.-H., & Luo, J. (2012). Visual event recognition in videos by learning from web data. IEEE Transactions Pattern Analysis and Machine Intelligence, 34(9), 1667–1680.CrossRef
Zurück zum Zitat Farhadi, A., & Tabrizi, M. K. (2008) Learning to recognize activities from the wrong view point. In European conference on computer vision (pp. 154–166). Farhadi, A., & Tabrizi, M. K. (2008) Learning to recognize activities from the wrong view point. In European conference on computer vision (pp. 154–166).
Zurück zum Zitat Hsu, C.-W., Chang, C.-C., & Lin, C.-J. (2003). A practical guide to support vector classification. In Technical Report: Department of Computer Science, National Taiwan University. Hsu, C.-W., Chang, C.-C., & Lin, C.-J. (2003). A practical guide to support vector classification. In Technical Report: Department of Computer Science, National Taiwan University.
Zurück zum Zitat Jhuo, I.-H., Liu, D., Lee, D. T., & Chang, S.-F. (2012). Robust visual domain adaptation with low-rank reconstruction. In IEEE conference on computer vision and pattern recognition (pp. 2168–2175). Jhuo, I.-H., Liu, D., Lee, D. T., & Chang, S.-F. (2012). Robust visual domain adaptation with low-rank reconstruction. In IEEE conference on computer vision and pattern recognition (pp. 2168–2175).
Zurück zum Zitat Kläser, A., Marszalek, M., & Schmid, C. (2008). A spatio-temporal descriptor based on 3d-gradients. In British machine vision conference. Kläser, A., Marszalek, M., & Schmid, C. (2008). A spatio-temporal descriptor based on 3d-gradients. In British machine vision conference.
Zurück zum Zitat Kovashka, A., & Grauman, K. (2010). Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In IEEE conference on computer vision and pattern recognition (pp. 2046–2053). Kovashka, A., & Grauman, K. (2010). Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In IEEE conference on computer vision and pattern recognition (pp. 2046–2053).
Zurück zum Zitat Kulis, B., Saenko, K., & Darrell, T. (2011). What you saw is not what you get: Domain adaptation using asymmetric kernel transforms. In IEEE conference on computer vision and pattern recognition (pp. 1785–1792). Kulis, B., Saenko, K., & Darrell, T. (2011). What you saw is not what you get: Domain adaptation using asymmetric kernel transforms. In IEEE conference on computer vision and pattern recognition (pp. 1785–1792).
Zurück zum Zitat Laptev, I., & Lindeberg, T. (2003). Space-time interest points. In International conference on computer vision (pp. 432–439). Laptev, I., & Lindeberg, T. (2003). Space-time interest points. In International conference on computer vision (pp. 432–439).
Zurück zum Zitat Laptev, I., Marszalek, M., Schmid, C., & Rozenfeld, B. (2008). Learning realistic human actions from movies. In IEEE conference on computer vision and pattern recognition. Laptev, I., Marszalek, M., Schmid, C., & Rozenfeld, B. (2008). Learning realistic human actions from movies. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Liu, J., Luo, J., & Shah, M. (2009). Recognizing realistic actions from videos. In IEEE conference on computer vision and pattern recognition (pp. 1996–2003). Liu, J., Luo, J., & Shah, M. (2009). Recognizing realistic actions from videos. In IEEE conference on computer vision and pattern recognition (pp. 1996–2003).
Zurück zum Zitat Liu, J., Shah, M., Kuipers, B., & Savarese, S. (2011). Cross-view action recognition via view knowledge transfer. In IEEE conference on computer vision and pattern recognition (pp. 3209–3216). Liu, J., Shah, M., Kuipers, B., & Savarese, S. (2011). Cross-view action recognition via view knowledge transfer. In IEEE conference on computer vision and pattern recognition (pp. 3209–3216).
Zurück zum Zitat Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.CrossRef Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.CrossRef
Zurück zum Zitat Luo, J., Tommasi, T., & Caputo, B. (2011). Multiclass transfer learning from unconstrained priors. In International conference on computer vision (pp. 1863–1870). Luo, J., Tommasi, T., & Caputo, B. (2011). Multiclass transfer learning from unconstrained priors. In International conference on computer vision (pp. 1863–1870).
Zurück zum Zitat Ma, Z., Yang, Y., Cai, Y., Sebe, N., & Hauptmann, A. G. (2012). Knowledge adaptation for ad hoc multimedia event detection with few exemplars. In ACM MM (pp. 469–478). Ma, Z., Yang, Y., Cai, Y., Sebe, N., & Hauptmann, A. G. (2012). Knowledge adaptation for ad hoc multimedia event detection with few exemplars. In ACM MM (pp. 469–478).
Zurück zum Zitat Nie, F., Huang, H., & Ding, C. (2012). Low-rank matrix recovery via efficient schatten p-norm minimization. In AAAI conference on artificial intelligence. Nie, F., Huang, H., & Ding, C. (2012). Low-rank matrix recovery via efficient schatten p-norm minimization. In AAAI conference on artificial intelligence.
Zurück zum Zitat Obozinski, G., Taskar, B., & Jordan, M. I. (2010). Joint covariate selection and joint subspace selection for multiple classification problems. Statistics and Computing, 20(2), 231–252.CrossRefMathSciNet Obozinski, G., Taskar, B., & Jordan, M. I. (2010). Joint covariate selection and joint subspace selection for multiple classification problems. Statistics and Computing, 20(2), 231–252.CrossRefMathSciNet
Zurück zum Zitat Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.CrossRef Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.CrossRef
Zurück zum Zitat Poppe, R. (2010). A survey on vision-based human action recognition. Image and Vision Computing, 28(6), 976–990.CrossRef Poppe, R. (2010). A survey on vision-based human action recognition. Image and Vision Computing, 28(6), 976–990.CrossRef
Zurück zum Zitat Qi, G., Aggarwal, C., Rui, Y., Tian, Q., Chang, S., & Huang, T. (2011). Towards cross-category knowledge propagation for learning visual concepts. In IEEE conference on computer vision and pattern recognition (pp. 897–904). Qi, G., Aggarwal, C., Rui, Y., Tian, Q., Chang, S., & Huang, T. (2011). Towards cross-category knowledge propagation for learning visual concepts. In IEEE conference on computer vision and pattern recognition (pp. 897–904).
Zurück zum Zitat Saberian, M. J., Masnadi-Shirazi, H., & Vasconcelos, N. (2011). Taylorboost: First and second-order boosting algorithms with explicit margin control. In IEEE conference on computer vision and pattern recognition (pp. 2929–2934). Saberian, M. J., Masnadi-Shirazi, H., & Vasconcelos, N. (2011). Taylorboost: First and second-order boosting algorithms with explicit margin control. In IEEE conference on computer vision and pattern recognition (pp. 2929–2934).
Zurück zum Zitat Salakhutdinov, R., Torralba, A., & Tenenbaum, J. (2011). Learning to share visual appearance for multiclass object detection. In IEEE conference on computer vision and pattern recognition (pp. 1481–1488). Salakhutdinov, R., Torralba, A., & Tenenbaum, J. (2011). Learning to share visual appearance for multiclass object detection. In IEEE conference on computer vision and pattern recognition (pp. 1481–1488).
Zurück zum Zitat Schölkopf, B., Smola, A. J., & Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), 1299–1319.CrossRef Schölkopf, B., Smola, A. J., & Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), 1299–1319.CrossRef
Zurück zum Zitat Schüldt, C., Laptev, I., & Caputo, B. (2004). Recognizing human actions: A local svm approach. In International conference on pattern recognition (pp. 32–36). Schüldt, C., Laptev, I., & Caputo, B. (2004). Recognizing human actions: A local svm approach. In International conference on pattern recognition (pp. 32–36).
Zurück zum Zitat Shi, Q., Cheng, L., Wang, L., & Smola, A. (2011). Human action segmentation and recognition using discriminative semi-Markov models. International Journal of Computer Vision, 93(1), 22–32.CrossRefMATH Shi, Q., Cheng, L., Wang, L., & Smola, A. (2011). Human action segmentation and recognition using discriminative semi-Markov models. International Journal of Computer Vision, 93(1), 22–32.CrossRefMATH
Zurück zum Zitat Sigal, L., Balan, A. O., & Black, M. J. (2010). HumanEva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Internatinal Journal of Computer Vision, 87(1–2), 4–27.CrossRef Sigal, L., Balan, A. O., & Black, M. J. (2010). HumanEva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Internatinal Journal of Computer Vision, 87(1–2), 4–27.CrossRef
Zurück zum Zitat Torresani, L., Szummer, M., & Fitzgibbon, A. W. (2010). Efficient object category recognition using classemes. In European conference on computer vision (pp. 776–789). Torresani, L., Szummer, M., & Fitzgibbon, A. W. (2010). Efficient object category recognition using classemes. In European conference on computer vision (pp. 776–789).
Zurück zum Zitat Wang, H., Ullah, M. M., Kläser, A., Laptev, I., & Schmid, C. (2009) Evaluation of local spatio-temporal features for action recognition. In British machine vision conference. Wang, H., Ullah, M. M., Kläser, A., Laptev, I., & Schmid, C. (2009) Evaluation of local spatio-temporal features for action recognition. In British machine vision conference.
Zurück zum Zitat Wang, L., Wang, Y., & Gao, W. (2011). Mining layered grammar rules for action recognition. International Journal of Computer Vision, 93(2), 162–182.CrossRefMATHMathSciNet Wang, L., Wang, Y., & Gao, W. (2011). Mining layered grammar rules for action recognition. International Journal of Computer Vision, 93(2), 162–182.CrossRefMATHMathSciNet
Zurück zum Zitat Wang, S., Yang, Y., Ma, Z., Li, X., Pang, C., & Hauptmann, A. (2012). Action recognition by exploring data distribution and feature correlation. In IEEE conference on computer vision and pattern recognition (pp. 1370–1377). Wang, S., Yang, Y., Ma, Z., Li, X., Pang, C., & Hauptmann, A. (2012). Action recognition by exploring data distribution and feature correlation. In IEEE conference on computer vision and pattern recognition (pp. 1370–1377).
Zurück zum Zitat Willems, G., Tuytelaars, T., & Gool, L. J. V. (2008) An efficient dense and scale-invariant spatio-temporal interest point detector. In European conference on computer vision (pp. 650–663). Willems, G., Tuytelaars, T., & Gool, L. J. V. (2008) An efficient dense and scale-invariant spatio-temporal interest point detector. In European conference on computer vision (pp. 650–663).
Zurück zum Zitat Wu, X., Xu, D., Duan, L., & Luo, J. (2011). Recognizing realistic actions from videos. In IEEE conference on computer vision and pattern recognition (pp. 489–496). Wu, X., Xu, D., Duan, L., & Luo, J. (2011). Recognizing realistic actions from videos. In IEEE conference on computer vision and pattern recognition (pp. 489–496).
Zurück zum Zitat Yang, J., Yan, R., & Hauptmann, A. G. (2007). Cross-domain video concept detection using adaptive svms. In ACM international conference on multimedia (pp. 188–197). Yang, J., Yan, R., & Hauptmann, A. G. (2007). Cross-domain video concept detection using adaptive svms. In ACM international conference on multimedia (pp. 188–197).
Zurück zum Zitat Yang, Y., Ma, Z., Hauptmann, A. G., & Sebe, N. (2013). Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Transactions on Multimedia, 15(3), 661–669.CrossRef Yang, Y., Ma, Z., Hauptmann, A. G., & Sebe, N. (2013). Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Transactions on Multimedia, 15(3), 661–669.CrossRef
Zurück zum Zitat You, D., Martínez, A. M. (2010). Bayes optimal kernel discriminant analysis. In IEEE conference on computer vision and pattern recognition (pp. 3533–3538). You, D., Martínez, A. M. (2010). Bayes optimal kernel discriminant analysis. In IEEE conference on computer vision and pattern recognition (pp. 3533–3538).
Zurück zum Zitat Yu, X., & Aloimonos, Y. (2010). Attribute-based transfer learning for object categorization with zero/one training example. In European conference on computer vision (pp. 127–140). Yu, X., & Aloimonos, Y. (2010). Attribute-based transfer learning for object categorization with zero/one training example. In European conference on computer vision (pp. 127–140).
Metadaten
Titel
Harnessing Lab Knowledge for Real-World Action Recognition
verfasst von
Zhigang Ma
Yi Yang
Feiping Nie
Nicu Sebe
Shuicheng Yan
Alexander G. Hauptmann
Publikationsdatum
01.08.2014
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 1-2/2014
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-014-0717-5

Weitere Artikel der Ausgabe 1-2/2014

International Journal of Computer Vision 1-2/2014 Zur Ausgabe