research-article

Latent Tensor Transfer Learning for RGB-D Action Recognition

Authors:
Chengcheng Jia

Northeastern Univeristy, Boston, USA

Northeastern Univeristy, Boston, USA
View Profile

,
Yu Kong

Northeastern University, Boston, USA

Northeastern University, Boston, USA
View Profile

,
Zhengming Ding

Northeastern University, Boston, USA

Northeastern University, Boston, USA
View Profile

,
Yun Raymond Fu

Northeastern University, Boston, MA, USA

Northeastern University, Boston, MA, USA
View Profile

MM '14: Proceedings of the 22nd ACM international conference on MultimediaNovember 2014Pages 87–96https://doi.org/10.1145/2647868.2654928

Published:03 November 2014Publication History

MM '14: Proceedings of the 22nd ACM international conference on Multimedia

Pages 87–96

ABSTRACT

This paper proposes a method to compensate RGB-D images from the original target RGB images by transferring the depth knowledge of source data. Conventional RGB databases (e.g., UT-Interaction database) do not contain depth information since they are captured by the RGB cameras. Therefore, the methods designed for {RGB} databases cannot take advantage of depth information, which proves useful for simplifying intra-class variations and background subtraction. In this paper, we present a novel transfer learning method that can transfer the knowledge from depth information to the RGB database, and use the additional source information to recognize human actions in RGB videos. Our method takes full advantage of 3D geometric information contained within the learned depth data, thus, can further improve action recognition performance. We treat action data as a fourth-order tensor (row, column, frame and sample), and apply latent low-rank transfer learning to learn shared subspaces of the source and target databases. Moreover, we introduce a novel cross-modality regularizer that plays an important role in finding the correlation between RGB and depth modalities, and then more depth information from the source database can be transferred to that of the target. Our method is extensively evaluated on public by available databases. Results of two action datasets show that our method outperforms existing methods.

References

R. H. Bartels and G. Stewart. Solution of the matrix equation ax+ xb= c {f4}. Communications of the ACM, 15(9):820--826, 1972. Google ScholarDigital Library
J.-F. Cai, E. J. Candès, and Z. Shen. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4):1956--1982, 2010. Google ScholarDigital Library
Y. Cao, D. Barrett, A. Barbu, S. Narayanaswamy, H. Yu, A. Michaux, Y. Lin, S. Dickinson, J. M. Siskind, and S. Wang. Recognize human activities from partially observed videos. In CVPR, pages 2658--2665. IEEE, 2013. Google ScholarDigital Library
W. Choi, K. Shahid, and S. Savarese. Learning context for collective activity recognition. In CVPR, pages 3273--3280, 2011. Google ScholarDigital Library
Z. Ding, M. Shao, and Y. Fu. Latent low-rank transfer subspace learning for missing modality recognition. AAAI, 2014.Google Scholar
P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie. Behavior recognition via sparse spatio-temporal features. In VS-PETS, pages 65--72, 2005. Google ScholarDigital Library
B. Gong, Y. Shi, F. Sha, and K. Grauman. Geodesic flow kernel for unsupervised domain adaptation. In CVPR, pages 2066--2073, 2012. Google ScholarDigital Library
S. Hadfield and R. Bowden. Hollywood 3D: Recognizing Actions in 3D Natural Scenes. In CVPR, pages 3398--3405, 2013. Google ScholarDigital Library
M. R. Hestenes. Multiplier and gradient methods. Journal of optimization theory and applications, 4(5):303--320, 1969.Google Scholar
I.-H. Jhuo, D. Liu, D. Lee, and S.-F. Chang. Robust visual domain adaptation with low-rank reconstruction. In CVPR, pages 2168--2175, 2012. Google ScholarDigital Library
S. Ji, W. Xu, M. Yang, and K. Yu. 3D Convolutional Neural Networks for Human Action Recognition. IEEE TPAMI, 35(1):221--231, 2013. Google ScholarDigital Library
C. Jia, G. Zhong, and Y. Fu. Low-rank tensor learning with discriminant analysis for action classification and image recovery. In AAAI, 2014.Google Scholar
A. Klaser, M. Marszalek, and C. Schmid. A spatio-temporal descriptor based on 3d-gradients. In BMVC, pages 1--10, 2008.Google ScholarCross Ref
T. G. Kolda and B. W. Bader. Tensor decompositions and applications. SIAM review, 51(3):455--500, 2009. Google ScholarDigital Library
Y. Kong, Y. Jia, and Y. Fu. Interactive phrases: Semantic descriptions for human interaction recognition. In IEEE TPAMI, 2014.Google ScholarCross Ref
Y. Kong, D. Kit, and Y. Fu. A discriminative model with multiple temporal scales for action prediction. In ECCV, 2014.Google ScholarCross Ref
I. Laptev. On space-time interest points. IJCV, 64(2):107--123, 2005. Google ScholarDigital Library
I. Laptev, M. Marszaŏek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. In CVPR, pages 1--8, 2008.Google ScholarCross Ref
K. Li and Y. Fu. Prediction of human activity by discovering temporal sequence patterns. TPAMI, 36(8):1644--1657, 2014. Google ScholarDigital Library
W. Li and N. Vasconcelos. Recognizing activities by attribute dynamics. In NIPS, pages 1106--1114, 2012.Google Scholar
W. Li, Z. Zhang, and Z. Liu. Action recognition based on a bag of 3d points. In CVPR workshop, pages 9--14, 2010.Google ScholarCross Ref
X. Li, S. Lin, S. Yan, and D. Xu. Discriminant locally linear embedding with high-order tensor data. IEEE T SYST MAN CY B, 38(2):342--352, 2008. Google ScholarDigital Library
G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma. Robust recovery of subspace structures by low-rank representation. IEEE TPAMI, 35(1):171--184, 2013. Google ScholarDigital Library
G. Liu, Z. Lin, and Y. Yu. Robust subspace segmentation by low-rank representation. In ICML, pages 663--670, 2010.Google ScholarDigital Library
G. Liu and S. Yan. Latent low-rank representation for subspace segmentation and feature extraction. In ICCV, pages 1615--1622, 2011. Google ScholarDigital Library
J. Liu, S. Ali, and M. Shah. Recognizing human actions using multiple features. In CVPR, pages 1--8, 2008.Google Scholar
J. Liu, B. Kuipers, and S. Savarese. Recognizing human actions by attributes. In CVPR, pages 3337--3344, 2011. Google ScholarDigital Library
L. Liu and L. Shao. Learning discriminative representations from rgb-d video data. In IJCAI, pages 1493--1500, 2013. Google ScholarDigital Library
R. Liu, Z. Lin, F. De la Torre, and Z. Su. Fixed-rank representation for unsupervised visual learning. In CVPR, pages 598--605, 2012. Google ScholarDigital Library
O. Oreifej and Z. Liu. HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences. In CVPR, pages 716--723, 2013. Google ScholarDigital Library
S. J. Pan and Q. Yang. A survey on transfer learning. IEEE TKDE, 22(10):1345--1359, 2010. Google ScholarDigital Library
Y. Pang, X. Li, and Y. Yuan. Robust tensor analysis with l1-norm. IEEE Trans. Circuits Syst. Video Technol., 20(2):172--178, 2010. Google ScholarDigital Library
M. J. D. Powell. A method for nonlinear constraints in minimization problems. Optimization, 60(1):283--298, 1969.Google Scholar
M. Raptis and L. Sigal. Poselet key-framing: A model for human activity recognition. In CVPR, pages 2650--2657. IEEE, 2013. Google ScholarDigital Library
M. D. Rodriguez, J. Ahmed, and M. Shah. Action mach: A spatio-temporal maximum average correlation height filter for action recognition. In CVPR, pages 1--8, 2008.Google ScholarCross Ref
M. Ryoo and J. Aggarwal. Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In ICCV, pages 1593--1600, 2009.Google ScholarCross Ref
M. Ryoo and J. Aggarwal. UT-Interaction Dataset, ICPR contest on Semantic Description of Human Activities (SDHA), 2010.Google Scholar
C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: A local svm approach. In ICPR, pages 32--36, 2004. Google ScholarDigital Library
M. Shao, C. Castillo, Z. Gu, and Y. Fu. Low-rank transfer subspace learning. In ICDM, pages 1104--1109, 2012. Google ScholarDigital Library
M. Shao, D. Kit, and Y. Fu. Generalized transfer subspace learning through low-rank constraint. IJCV, pages 1--20, 2014. Google ScholarDigital Library
K. Soomro, A. R. Zamir, and M. Shah. Ucf101: A dataset of 101 human action classes from videos in the wild. In CRCV-TR-12-01, 2013.Google Scholar
J. Sun, D. Tao, S. Papadimitriou, P. S. Yu, and C. Faloutsos. Incremental tensor analysis: Theory and applications. TKDD, 2(3):11, 2008. Google ScholarDigital Library
I. Sutskever, J. B. Tenenbaum, and R. Salakhutdinov. Modelling relational data using bayesian clustered tensor factorization. In NIPS, pages 1821--1828, 2009.Google ScholarDigital Library
D. Tao, X. Li, X. Wu, W. Hu, and S. J. Maybank. Supervised tensor learning. KNOWL INF SYST, 13(1):1--42, 2007. Google ScholarDigital Library
D. Tao, X. Li, X. Wu, and S. J. Maybank. General tensor discriminant analysis and gabor features for gait recognition. TPAMI, 29(10):1700--1715, 2007. Google ScholarDigital Library
D. Tao, M. Song, X. Li, J. Shen, J. Sun, X. Wu, C. Faloutsos, and S. J. Maybank. Bayesian tensor approach for 3-d face modeling. IEEE Trans. Circuits Syst. Video Technol., 18(10):1397--1410, 2008. Google ScholarDigital Library
U. Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395--416, 2007. Google ScholarDigital Library
J. Wang, Z. Liu, J. Chorowski, Z. Chen, and Y. Wu. Robust 3D Action Recognition with Random Occupancy Patterns. In ECCV, pages 872--885, 2012. Google ScholarDigital Library
J. Wang, Z. Liu, Y. Wu, and J. Yuan. Mining actionlet ensemble for action recognition with depth cameras. In CVPR, pages 1290--1297, 2012. Google ScholarDigital Library
Y. Wang and G. Mori. Hidden part models for human action recognition: Probabilistic vs. max-margin. IEEE TPAMI, 33(7):1310--1323, 2011. Google ScholarDigital Library
Z. Wang, J. Wang, J. Xiao, K.-H. Lin, and T. S. Huang. Substructural and boundary modeling for continuous action recognition. In CVPR, pages 1330--1337, 2012. Google ScholarDigital Library
L. Xia and J. Aggarwal. Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In CVPR, pages 2834--2841, 2013. Google ScholarDigital Library
S. Yan, D. Xu, B. Zhang, H.-J. Zhang, Q. Yang, and S. Lin. Graph embedding and extensions: a general framework for dimensionality reduction. IEEE TPAMI, 29(1):40--51, 2007. Google ScholarDigital Library
J. Yang, W. Yin, Y. Zhang, and Y. Wang. A fast algorithm for edge-preserving variational multichannel image restoration. SIAM Journal on Imaging Sciences, 2(2):569--592, 2009. Google ScholarDigital Library

Index Terms

Latent Tensor Transfer Learning for RGB-D Action Recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Collaborative multimodal feature learning for RGB-D action recognition
Highlights
- Our CMFL model jointly learns shared-specific features and action classifiers.
- ...
Abstract
The emergence of cost-effective depth sensors opens up a new dimension for RGB-D based human action recognition. In this paper, we propose a collaborative multimodal feature learning (CMFL) model for human action recognition from RGB-D ...
Read More
Improving RGB-D Face Recognition via Transfer Learning from a Pretrained 2D Network
Benchmarking, Measuring, and Optimizing
Abstract
2D Face recognition has been extensively studied for decades and has reached remarkable results in recent years. However, 2D Face recognition is sensitive to variations in poses, facial expressions and illuminations. Depth images provide valuable ...
Read More
Latent sparse transfer subspace learning for cross-corpus facial expression recognition
Abstract
Facial expression recognition has become an increasingly important research topic in pattern recognition and affective computing. Most of facial expression recognition methods assume that the training and testing data come from the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '14: Proceedings of the 22nd ACM international conference on Multimedia
November 2014
1310 pages
ISBN:9781450330633
DOI:10.1145/2647868
General Chairs:
Kien A. Hua
University of Central Florida, USA
,
Yong Rui
Microsoft Research, China
,
Ralf Steinmetz
Technische Universitt Darmstadt, Germany
,
Program Chairs:
Alan Hanjalic
Delft University of Technology, Netherlands
,
Apostol (Paul) Natsev
Google, USA
,
Wenwu Zhu
Tsinghua University, China
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 November 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
latent tensor
rgb-d action recognition
transfer learning
Qualifiers
- research-article
Conference

Acceptance Rates
MM '14 Paper Acceptance Rate55of286submissions,19%Overall Acceptance Rate995of4,171submissions,24%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 35
  Total Citations
  View Citations
- 720
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Latent Tensor Transfer Learning for RGB-D Action Recognition

MM '14: Proceedings of the 22nd ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Collaborative multimodal feature learning for RGB-D action recognition

Improving RGB-D Face Recognition via Transfer Learning from a Pretrained 2D Network

Latent sparse transfer subspace learning for cross-corpus facial expression recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Latent Tensor Transfer Learning for RGB-D Action Recognition

MM '14: Proceedings of the 22nd ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Collaborative multimodal feature learning for RGB-D action recognition

Improving RGB-D Face Recognition via Transfer Learning from a Pretrained 2D Network

Latent sparse transfer subspace learning for cross-corpus facial expression recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media