ABSTRACT
This paper proposes a method to compensate RGB-D images from the original target RGB images by transferring the depth knowledge of source data. Conventional RGB databases (e.g., UT-Interaction database) do not contain depth information since they are captured by the RGB cameras. Therefore, the methods designed for {RGB} databases cannot take advantage of depth information, which proves useful for simplifying intra-class variations and background subtraction. In this paper, we present a novel transfer learning method that can transfer the knowledge from depth information to the RGB database, and use the additional source information to recognize human actions in RGB videos. Our method takes full advantage of 3D geometric information contained within the learned depth data, thus, can further improve action recognition performance. We treat action data as a fourth-order tensor (row, column, frame and sample), and apply latent low-rank transfer learning to learn shared subspaces of the source and target databases. Moreover, we introduce a novel cross-modality regularizer that plays an important role in finding the correlation between RGB and depth modalities, and then more depth information from the source database can be transferred to that of the target. Our method is extensively evaluated on public by available databases. Results of two action datasets show that our method outperforms existing methods.
- R. H. Bartels and G. Stewart. Solution of the matrix equation ax+ xb= c {f4}. Communications of the ACM, 15(9):820--826, 1972. Google ScholarDigital Library
- J.-F. Cai, E. J. Candès, and Z. Shen. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4):1956--1982, 2010. Google ScholarDigital Library
- Y. Cao, D. Barrett, A. Barbu, S. Narayanaswamy, H. Yu, A. Michaux, Y. Lin, S. Dickinson, J. M. Siskind, and S. Wang. Recognize human activities from partially observed videos. In CVPR, pages 2658--2665. IEEE, 2013. Google ScholarDigital Library
- W. Choi, K. Shahid, and S. Savarese. Learning context for collective activity recognition. In CVPR, pages 3273--3280, 2011. Google ScholarDigital Library
- Z. Ding, M. Shao, and Y. Fu. Latent low-rank transfer subspace learning for missing modality recognition. AAAI, 2014.Google Scholar
- P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie. Behavior recognition via sparse spatio-temporal features. In VS-PETS, pages 65--72, 2005. Google ScholarDigital Library
- B. Gong, Y. Shi, F. Sha, and K. Grauman. Geodesic flow kernel for unsupervised domain adaptation. In CVPR, pages 2066--2073, 2012. Google ScholarDigital Library
- S. Hadfield and R. Bowden. Hollywood 3D: Recognizing Actions in 3D Natural Scenes. In CVPR, pages 3398--3405, 2013. Google ScholarDigital Library
- M. R. Hestenes. Multiplier and gradient methods. Journal of optimization theory and applications, 4(5):303--320, 1969.Google Scholar
- I.-H. Jhuo, D. Liu, D. Lee, and S.-F. Chang. Robust visual domain adaptation with low-rank reconstruction. In CVPR, pages 2168--2175, 2012. Google ScholarDigital Library
- S. Ji, W. Xu, M. Yang, and K. Yu. 3D Convolutional Neural Networks for Human Action Recognition. IEEE TPAMI, 35(1):221--231, 2013. Google ScholarDigital Library
- C. Jia, G. Zhong, and Y. Fu. Low-rank tensor learning with discriminant analysis for action classification and image recovery. In AAAI, 2014.Google Scholar
- A. Klaser, M. Marszalek, and C. Schmid. A spatio-temporal descriptor based on 3d-gradients. In BMVC, pages 1--10, 2008.Google ScholarCross Ref
- T. G. Kolda and B. W. Bader. Tensor decompositions and applications. SIAM review, 51(3):455--500, 2009. Google ScholarDigital Library
- Y. Kong, Y. Jia, and Y. Fu. Interactive phrases: Semantic descriptions for human interaction recognition. In IEEE TPAMI, 2014.Google ScholarCross Ref
- Y. Kong, D. Kit, and Y. Fu. A discriminative model with multiple temporal scales for action prediction. In ECCV, 2014.Google ScholarCross Ref
- I. Laptev. On space-time interest points. IJCV, 64(2):107--123, 2005. Google ScholarDigital Library
- I. Laptev, M. Marszaŏek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. In CVPR, pages 1--8, 2008.Google ScholarCross Ref
- K. Li and Y. Fu. Prediction of human activity by discovering temporal sequence patterns. TPAMI, 36(8):1644--1657, 2014. Google ScholarDigital Library
- W. Li and N. Vasconcelos. Recognizing activities by attribute dynamics. In NIPS, pages 1106--1114, 2012.Google Scholar
- W. Li, Z. Zhang, and Z. Liu. Action recognition based on a bag of 3d points. In CVPR workshop, pages 9--14, 2010.Google ScholarCross Ref
- X. Li, S. Lin, S. Yan, and D. Xu. Discriminant locally linear embedding with high-order tensor data. IEEE T SYST MAN CY B, 38(2):342--352, 2008. Google ScholarDigital Library
- G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma. Robust recovery of subspace structures by low-rank representation. IEEE TPAMI, 35(1):171--184, 2013. Google ScholarDigital Library
- G. Liu, Z. Lin, and Y. Yu. Robust subspace segmentation by low-rank representation. In ICML, pages 663--670, 2010.Google ScholarDigital Library
- G. Liu and S. Yan. Latent low-rank representation for subspace segmentation and feature extraction. In ICCV, pages 1615--1622, 2011. Google ScholarDigital Library
- J. Liu, S. Ali, and M. Shah. Recognizing human actions using multiple features. In CVPR, pages 1--8, 2008.Google Scholar
- J. Liu, B. Kuipers, and S. Savarese. Recognizing human actions by attributes. In CVPR, pages 3337--3344, 2011. Google ScholarDigital Library
- L. Liu and L. Shao. Learning discriminative representations from rgb-d video data. In IJCAI, pages 1493--1500, 2013. Google ScholarDigital Library
- R. Liu, Z. Lin, F. De la Torre, and Z. Su. Fixed-rank representation for unsupervised visual learning. In CVPR, pages 598--605, 2012. Google ScholarDigital Library
- O. Oreifej and Z. Liu. HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences. In CVPR, pages 716--723, 2013. Google ScholarDigital Library
- S. J. Pan and Q. Yang. A survey on transfer learning. IEEE TKDE, 22(10):1345--1359, 2010. Google ScholarDigital Library
- Y. Pang, X. Li, and Y. Yuan. Robust tensor analysis with l1-norm. IEEE Trans. Circuits Syst. Video Technol., 20(2):172--178, 2010. Google ScholarDigital Library
- M. J. D. Powell. A method for nonlinear constraints in minimization problems. Optimization, 60(1):283--298, 1969.Google Scholar
- M. Raptis and L. Sigal. Poselet key-framing: A model for human activity recognition. In CVPR, pages 2650--2657. IEEE, 2013. Google ScholarDigital Library
- M. D. Rodriguez, J. Ahmed, and M. Shah. Action mach: A spatio-temporal maximum average correlation height filter for action recognition. In CVPR, pages 1--8, 2008.Google ScholarCross Ref
- M. Ryoo and J. Aggarwal. Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In ICCV, pages 1593--1600, 2009.Google ScholarCross Ref
- M. Ryoo and J. Aggarwal. UT-Interaction Dataset, ICPR contest on Semantic Description of Human Activities (SDHA), 2010.Google Scholar
- C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: A local svm approach. In ICPR, pages 32--36, 2004. Google ScholarDigital Library
- M. Shao, C. Castillo, Z. Gu, and Y. Fu. Low-rank transfer subspace learning. In ICDM, pages 1104--1109, 2012. Google ScholarDigital Library
- M. Shao, D. Kit, and Y. Fu. Generalized transfer subspace learning through low-rank constraint. IJCV, pages 1--20, 2014. Google ScholarDigital Library
- K. Soomro, A. R. Zamir, and M. Shah. Ucf101: A dataset of 101 human action classes from videos in the wild. In CRCV-TR-12-01, 2013.Google Scholar
- J. Sun, D. Tao, S. Papadimitriou, P. S. Yu, and C. Faloutsos. Incremental tensor analysis: Theory and applications. TKDD, 2(3):11, 2008. Google ScholarDigital Library
- I. Sutskever, J. B. Tenenbaum, and R. Salakhutdinov. Modelling relational data using bayesian clustered tensor factorization. In NIPS, pages 1821--1828, 2009.Google ScholarDigital Library
- D. Tao, X. Li, X. Wu, W. Hu, and S. J. Maybank. Supervised tensor learning. KNOWL INF SYST, 13(1):1--42, 2007. Google ScholarDigital Library
- D. Tao, X. Li, X. Wu, and S. J. Maybank. General tensor discriminant analysis and gabor features for gait recognition. TPAMI, 29(10):1700--1715, 2007. Google ScholarDigital Library
- D. Tao, M. Song, X. Li, J. Shen, J. Sun, X. Wu, C. Faloutsos, and S. J. Maybank. Bayesian tensor approach for 3-d face modeling. IEEE Trans. Circuits Syst. Video Technol., 18(10):1397--1410, 2008. Google ScholarDigital Library
- U. Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395--416, 2007. Google ScholarDigital Library
- J. Wang, Z. Liu, J. Chorowski, Z. Chen, and Y. Wu. Robust 3D Action Recognition with Random Occupancy Patterns. In ECCV, pages 872--885, 2012. Google ScholarDigital Library
- J. Wang, Z. Liu, Y. Wu, and J. Yuan. Mining actionlet ensemble for action recognition with depth cameras. In CVPR, pages 1290--1297, 2012. Google ScholarDigital Library
- Y. Wang and G. Mori. Hidden part models for human action recognition: Probabilistic vs. max-margin. IEEE TPAMI, 33(7):1310--1323, 2011. Google ScholarDigital Library
- Z. Wang, J. Wang, J. Xiao, K.-H. Lin, and T. S. Huang. Substructural and boundary modeling for continuous action recognition. In CVPR, pages 1330--1337, 2012. Google ScholarDigital Library
- L. Xia and J. Aggarwal. Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In CVPR, pages 2834--2841, 2013. Google ScholarDigital Library
- S. Yan, D. Xu, B. Zhang, H.-J. Zhang, Q. Yang, and S. Lin. Graph embedding and extensions: a general framework for dimensionality reduction. IEEE TPAMI, 29(1):40--51, 2007. Google ScholarDigital Library
- J. Yang, W. Yin, Y. Zhang, and Y. Wang. A fast algorithm for edge-preserving variational multichannel image restoration. SIAM Journal on Imaging Sciences, 2(2):569--592, 2009. Google ScholarDigital Library
Index Terms
- Latent Tensor Transfer Learning for RGB-D Action Recognition
Recommendations
Collaborative multimodal feature learning for RGB-D action recognition
Highlights- Our CMFL model jointly learns shared-specific features and action classifiers.
- ...
AbstractThe emergence of cost-effective depth sensors opens up a new dimension for RGB-D based human action recognition. In this paper, we propose a collaborative multimodal feature learning (CMFL) model for human action recognition from RGB-D ...
Improving RGB-D Face Recognition via Transfer Learning from a Pretrained 2D Network
Benchmarking, Measuring, and OptimizingAbstract2D Face recognition has been extensively studied for decades and has reached remarkable results in recent years. However, 2D Face recognition is sensitive to variations in poses, facial expressions and illuminations. Depth images provide valuable ...
Latent sparse transfer subspace learning for cross-corpus facial expression recognition
AbstractFacial expression recognition has become an increasingly important research topic in pattern recognition and affective computing. Most of facial expression recognition methods assume that the training and testing data come from the ...
Comments