skip to main content
10.1145/2647868.2654928acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Latent Tensor Transfer Learning for RGB-D Action Recognition

Published:03 November 2014Publication History

ABSTRACT

This paper proposes a method to compensate RGB-D images from the original target RGB images by transferring the depth knowledge of source data. Conventional RGB databases (e.g., UT-Interaction database) do not contain depth information since they are captured by the RGB cameras. Therefore, the methods designed for {RGB} databases cannot take advantage of depth information, which proves useful for simplifying intra-class variations and background subtraction. In this paper, we present a novel transfer learning method that can transfer the knowledge from depth information to the RGB database, and use the additional source information to recognize human actions in RGB videos. Our method takes full advantage of 3D geometric information contained within the learned depth data, thus, can further improve action recognition performance. We treat action data as a fourth-order tensor (row, column, frame and sample), and apply latent low-rank transfer learning to learn shared subspaces of the source and target databases. Moreover, we introduce a novel cross-modality regularizer that plays an important role in finding the correlation between RGB and depth modalities, and then more depth information from the source database can be transferred to that of the target. Our method is extensively evaluated on public by available databases. Results of two action datasets show that our method outperforms existing methods.

References

  1. R. H. Bartels and G. Stewart. Solution of the matrix equation ax+ xb= c {f4}. Communications of the ACM, 15(9):820--826, 1972. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J.-F. Cai, E. J. Candès, and Z. Shen. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4):1956--1982, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Y. Cao, D. Barrett, A. Barbu, S. Narayanaswamy, H. Yu, A. Michaux, Y. Lin, S. Dickinson, J. M. Siskind, and S. Wang. Recognize human activities from partially observed videos. In CVPR, pages 2658--2665. IEEE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. W. Choi, K. Shahid, and S. Savarese. Learning context for collective activity recognition. In CVPR, pages 3273--3280, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Z. Ding, M. Shao, and Y. Fu. Latent low-rank transfer subspace learning for missing modality recognition. AAAI, 2014.Google ScholarGoogle Scholar
  6. P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie. Behavior recognition via sparse spatio-temporal features. In VS-PETS, pages 65--72, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Gong, Y. Shi, F. Sha, and K. Grauman. Geodesic flow kernel for unsupervised domain adaptation. In CVPR, pages 2066--2073, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Hadfield and R. Bowden. Hollywood 3D: Recognizing Actions in 3D Natural Scenes. In CVPR, pages 3398--3405, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. R. Hestenes. Multiplier and gradient methods. Journal of optimization theory and applications, 4(5):303--320, 1969.Google ScholarGoogle Scholar
  10. I.-H. Jhuo, D. Liu, D. Lee, and S.-F. Chang. Robust visual domain adaptation with low-rank reconstruction. In CVPR, pages 2168--2175, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Ji, W. Xu, M. Yang, and K. Yu. 3D Convolutional Neural Networks for Human Action Recognition. IEEE TPAMI, 35(1):221--231, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Jia, G. Zhong, and Y. Fu. Low-rank tensor learning with discriminant analysis for action classification and image recovery. In AAAI, 2014.Google ScholarGoogle Scholar
  13. A. Klaser, M. Marszalek, and C. Schmid. A spatio-temporal descriptor based on 3d-gradients. In BMVC, pages 1--10, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  14. T. G. Kolda and B. W. Bader. Tensor decompositions and applications. SIAM review, 51(3):455--500, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Y. Kong, Y. Jia, and Y. Fu. Interactive phrases: Semantic descriptions for human interaction recognition. In IEEE TPAMI, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  16. Y. Kong, D. Kit, and Y. Fu. A discriminative model with multiple temporal scales for action prediction. In ECCV, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  17. I. Laptev. On space-time interest points. IJCV, 64(2):107--123, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. I. Laptev, M. Marszaŏek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. In CVPR, pages 1--8, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  19. K. Li and Y. Fu. Prediction of human activity by discovering temporal sequence patterns. TPAMI, 36(8):1644--1657, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. W. Li and N. Vasconcelos. Recognizing activities by attribute dynamics. In NIPS, pages 1106--1114, 2012.Google ScholarGoogle Scholar
  21. W. Li, Z. Zhang, and Z. Liu. Action recognition based on a bag of 3d points. In CVPR workshop, pages 9--14, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  22. X. Li, S. Lin, S. Yan, and D. Xu. Discriminant locally linear embedding with high-order tensor data. IEEE T SYST MAN CY B, 38(2):342--352, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma. Robust recovery of subspace structures by low-rank representation. IEEE TPAMI, 35(1):171--184, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. G. Liu, Z. Lin, and Y. Yu. Robust subspace segmentation by low-rank representation. In ICML, pages 663--670, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. Liu and S. Yan. Latent low-rank representation for subspace segmentation and feature extraction. In ICCV, pages 1615--1622, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Liu, S. Ali, and M. Shah. Recognizing human actions using multiple features. In CVPR, pages 1--8, 2008.Google ScholarGoogle Scholar
  27. J. Liu, B. Kuipers, and S. Savarese. Recognizing human actions by attributes. In CVPR, pages 3337--3344, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. L. Liu and L. Shao. Learning discriminative representations from rgb-d video data. In IJCAI, pages 1493--1500, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. Liu, Z. Lin, F. De la Torre, and Z. Su. Fixed-rank representation for unsupervised visual learning. In CVPR, pages 598--605, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. O. Oreifej and Z. Liu. HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences. In CVPR, pages 716--723, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. J. Pan and Q. Yang. A survey on transfer learning. IEEE TKDE, 22(10):1345--1359, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Y. Pang, X. Li, and Y. Yuan. Robust tensor analysis with l1-norm. IEEE Trans. Circuits Syst. Video Technol., 20(2):172--178, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. J. D. Powell. A method for nonlinear constraints in minimization problems. Optimization, 60(1):283--298, 1969.Google ScholarGoogle Scholar
  34. M. Raptis and L. Sigal. Poselet key-framing: A model for human activity recognition. In CVPR, pages 2650--2657. IEEE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. D. Rodriguez, J. Ahmed, and M. Shah. Action mach: A spatio-temporal maximum average correlation height filter for action recognition. In CVPR, pages 1--8, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  36. M. Ryoo and J. Aggarwal. Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In ICCV, pages 1593--1600, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  37. M. Ryoo and J. Aggarwal. UT-Interaction Dataset, ICPR contest on Semantic Description of Human Activities (SDHA), 2010.Google ScholarGoogle Scholar
  38. C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: A local svm approach. In ICPR, pages 32--36, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. Shao, C. Castillo, Z. Gu, and Y. Fu. Low-rank transfer subspace learning. In ICDM, pages 1104--1109, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. M. Shao, D. Kit, and Y. Fu. Generalized transfer subspace learning through low-rank constraint. IJCV, pages 1--20, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. K. Soomro, A. R. Zamir, and M. Shah. Ucf101: A dataset of 101 human action classes from videos in the wild. In CRCV-TR-12-01, 2013.Google ScholarGoogle Scholar
  42. J. Sun, D. Tao, S. Papadimitriou, P. S. Yu, and C. Faloutsos. Incremental tensor analysis: Theory and applications. TKDD, 2(3):11, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. I. Sutskever, J. B. Tenenbaum, and R. Salakhutdinov. Modelling relational data using bayesian clustered tensor factorization. In NIPS, pages 1821--1828, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. D. Tao, X. Li, X. Wu, W. Hu, and S. J. Maybank. Supervised tensor learning. KNOWL INF SYST, 13(1):1--42, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. D. Tao, X. Li, X. Wu, and S. J. Maybank. General tensor discriminant analysis and gabor features for gait recognition. TPAMI, 29(10):1700--1715, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. D. Tao, M. Song, X. Li, J. Shen, J. Sun, X. Wu, C. Faloutsos, and S. J. Maybank. Bayesian tensor approach for 3-d face modeling. IEEE Trans. Circuits Syst. Video Technol., 18(10):1397--1410, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. U. Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395--416, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. J. Wang, Z. Liu, J. Chorowski, Z. Chen, and Y. Wu. Robust 3D Action Recognition with Random Occupancy Patterns. In ECCV, pages 872--885, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. J. Wang, Z. Liu, Y. Wu, and J. Yuan. Mining actionlet ensemble for action recognition with depth cameras. In CVPR, pages 1290--1297, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Y. Wang and G. Mori. Hidden part models for human action recognition: Probabilistic vs. max-margin. IEEE TPAMI, 33(7):1310--1323, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Z. Wang, J. Wang, J. Xiao, K.-H. Lin, and T. S. Huang. Substructural and boundary modeling for continuous action recognition. In CVPR, pages 1330--1337, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. L. Xia and J. Aggarwal. Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In CVPR, pages 2834--2841, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. S. Yan, D. Xu, B. Zhang, H.-J. Zhang, Q. Yang, and S. Lin. Graph embedding and extensions: a general framework for dimensionality reduction. IEEE TPAMI, 29(1):40--51, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. J. Yang, W. Yin, Y. Zhang, and Y. Wang. A fast algorithm for edge-preserving variational multichannel image restoration. SIAM Journal on Imaging Sciences, 2(2):569--592, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Latent Tensor Transfer Learning for RGB-D Action Recognition

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MM '14: Proceedings of the 22nd ACM international conference on Multimedia
        November 2014
        1310 pages
        ISBN:9781450330633
        DOI:10.1145/2647868

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 3 November 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        MM '14 Paper Acceptance Rate55of286submissions,19%Overall Acceptance Rate995of4,171submissions,24%

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader