Evaluation of semi-supervised learning method on action recognition

Shen, Haoquan; Yan, Yan; Xu, Shicheng; Ballas, Nicolas; Chen, Wenzhi

doi:10.1007/s11042-014-1936-z

Evaluation of semi-supervised learning method on action recognition

Published: 25 March 2014

Volume 74, pages 523–542, (2015)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Haoquan Shen¹,
Yan Yan²,
Shicheng Xu¹,
Nicolas Ballas³ &
…
Wenzhi Chen¹

500 Accesses
10 Citations
Explore all metrics

Abstract

Action recognition is one of the most difficult problems in computer vision and multimedia areas, since both spatial information and spatiotemporal semantic meaning should be taken into consideration. Moreover, the noisy and weakly annotated information make this task even harder. Nowadays, instead of the traditional features and classifiers, a lot of new attempts have made the task of action recognition promising. Noticing that there is no work on comparison of different combination of pooling and semi-supervised learning method under the same experiment setting, it would be interesting to apply different combination of pooling and semi-supervised learning method on both the synthetic and realistic action recognition datasets to see which combination or method performs better. In summary, we can obtain the following conclusions based on our experiments. Firstly, Second Order Pooling (Carreira et al. 2012) is worse than the traditional Bag of Words (Schmid and Mohr 1997; Dance et al. 2004) regarding to the overall performance in some dataset, but is a good way to speed up the coding stage of video classification with little sacrifice of performance. Secondly, Semi-supervised Hierarchical Regression Algorithm (MLHR) and Manifold Regularized Least Square Regression (MRLS) (Belkin et al. J Mach Learn Res 12:2399–2434, 2006) is better than some of the supervised learning methods (χ ²-SVM, SVM-2K (Farquhar et al. 2006)) in the real world action recognition problems which shares little available annotated information. Thirdly, for KTH, UCF50 and HMDB dataset, late fusion doesn’t necessarily improve the performance. In comparison, MLHR, SVM-2K and Multi-kernel Learning is a more natural way to deal with multi-feature problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labled and unlabeled examples. J Mach Learn Res 12:2399–2434
MathSciNet Google Scholar
Carreira J, Caseiro R, Batista J, Sminchisescu C (2012) Semantic segmentation with second-order pooling. In: ECCV
Chen M, Hauptmann A (2009) Mosift, recognizing human actions in surveillance videos
Dance C, Willamowski J, Fan L, Bray C, Csurka G (2004) Visual categorization with bags of keypoints. In: ECCV SLCV workshop
Farquhar JDR, Meng H, Szedmak S, Hardoon DR, Shawe-taylor J (2006) Two view learning: svm-2k, theory and practice. In: Advances in neural information processing systems. MIT Press
Han Y, Xu Z, Ma Z, Huang Z (2013) Image classification with manifold learning for out-of-sample data. Signal Process 93(8):2169–2177
Article Google Scholar
Han Y, Yang Y, Ma Z, Shen H, Sebe N, Zhou X (2014) Image attibute adaptation. IEEE Trans Multimed (IEEE T-MM). doi:10.1109/TMM.2014.2306092
Han Y, Zhang J, Xu Z, Yu S (2013) Discriminative multi-task feature selection. In: AAAI
Hotelling H (1936) Relations between two sets of variates. Biometrika 28(3):321–377
Article MATH MathSciNet Google Scholar
Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) Hmdb: a large video database for human motion recognition. In: ICCV
Lan Z, Bao L, Yu S, Liu W, Hauptmann A (2012) Double fusion for multimedia event detection. In: ACM MM
Lew M, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: state-of-the-art and challenges. ACM Trans Multimed Comput Commun Appl 2(1):1–19
Article Google Scholar
Ma Z, Nie F, Yang Y, Uijlings J, Sebe N, Hauptmann AG (2012) Discriminating joint feature analysis for multimedia data understanding. IEEE Trans Multimed (TMM) 14(6):1662–1672
Article Google Scholar
Ma Z, Yang Y, Cai Y, Sebe N, Hauptmann A (2012) Transfer knowledge adaptation for ad hoc multimedia event detection with few examplars. In: ACM MM
Reddy K, Shah M (2012) Recognizing 50 human action categories of web videos. In: MVAP
Schmid C, Mohr R (1997) Local grayvalue invariants for image retrieval. In: TPAMI
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local svm approach. In: ICPR
Snoek C, Worring M, Smeulders A (2005) Early versus late fusion in semantic video analysis. In: ACM MM
Sonnenburg S, Rtsch G, Schfer C, Schlkopf B (2006) Large scale multiple kernel learning. J Mach Learn Res 7:1531–1565
MATH MathSciNet Google Scholar
Vinokourov A, Shawe-taylor J, Cristianini N (2002) Inferring a semantic representation of text via cross-language correlation analysis
Wang H, Kläser A, Schmid C, Liu C (2011) Action recognition by dense trajectories. In: CVPR
Xu Z, Yang Y, Tsang I, Sebe N, Hauptmann A (2013) Feature weighting via optimal thresholding for video analysis. In: ICCV
Yan R (2006) Probabilistic latent query analysis for combining multiple retrieval sources. In: Proceedings of the 29th international ACM SIGIR conference. ACM Press, pp 324–331
Yan Y, Xu Z, Liu G, Ma Z, Sebe N (2013) Glocal structural feature selection with sparsity for multimedia data understanding. In: ACM MM
Yang Y, Ma Z, Hauptmann A, Sebe N (2013) Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Trans Multimedia 15(3):321–377
Google Scholar
Yang Y, Nie F, Xu D, Luo J, Zhuang Y, Pan Y (2012) A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Trans Pattern Anal Machine Intell 34(4):723–742
Article Google Scholar
Yang Y, Song J, Huang Z, Ma Z, Sebe N, Hauptmann A (2013) Multi-feature fusion via hierarchical regression for multimedia analysis. IEEE Trans Multimedia 15(3):572–581
Article Google Scholar
Yang Y, Xu D, Nie F, Luo J, Zhuang Y (2009) Ranking with local regression and global alignment for cross media retrieval. In: ACM MM
Yang Y, Zhuang Y, Wu F, Pan Y (2008) Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans Multimed 10(3):437–446
Article Google Scholar
Zhan Y, Sun J, Niu D, Mao Q, Fan J (2014) A semi-supervised incremental learning method based on adaptive probabilistic hypergraph for video semantic detection. Multimed Tools Appl
Zhou D, Bousquet O, Lal TN, Weston J, Schlkopf B (2004) Learning with local and global consistency. In: Advances in neural information processing systems, vol 16. MIT Press, pp 321–328

Download references

Author information

Authors and Affiliations

Department of Computer Science, Zhejiang University, Zhejiang, China
Haoquan Shen, Shicheng Xu & Wenzhi Chen
Department of Information Engineering and Computer Science, University of Trento, Trento, Italy
Yan Yan
CEA and Mines-ParisTech, Paris, France
Nicolas Ballas

Authors

Haoquan Shen
View author publications
You can also search for this author in PubMed Google Scholar
Yan Yan
View author publications
You can also search for this author in PubMed Google Scholar
Shicheng Xu
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Ballas
View author publications
You can also search for this author in PubMed Google Scholar
Wenzhi Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haoquan Shen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shen, H., Yan, Y., Xu, S. et al. Evaluation of semi-supervised learning method on action recognition. Multimed Tools Appl 74, 523–542 (2015). https://doi.org/10.1007/s11042-014-1936-z

Download citation

Published: 25 March 2014
Issue Date: January 2015
DOI: https://doi.org/10.1007/s11042-014-1936-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluation of semi-supervised learning method on action recognition

Abstract

Access this article

Similar content being viewed by others

Human Action Recognition Based on Sub-data Learning

Stratified pooling based deep convolutional neural networks for human action recognition

Second-order Temporal Pooling for Action Recognition

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Evaluation of semi-supervised learning method on action recognition

Abstract

Access this article

Similar content being viewed by others

Human Action Recognition Based on Sub-data Learning

Stratified pooling based deep convolutional neural networks for human action recognition

Second-order Temporal Pooling for Action Recognition

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation