nach oben

Pattern Analysis and Applications

Erschienen in:

01.05.2016 | Theoretical Advances

On-line deep learning method for action recognition

verfasst von: Konstantinos Charalampous, Antonios Gasteratos

Erschienen in: Pattern Analysis and Applications | Ausgabe 2/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this paper an unsupervised on-line deep learning algorithm for action recognition in video sequences is proposed. Deep learning models capable of deriving spatio-temporal data have been proposed in the past with remarkable results, yet, they are mostly restricted to building features from a short window length. The model presented here, on the other hand, considers the entire sample sequence and extracts the description in a frame-by-frame manner. Each computational node of the proposed paradigm forms clusters and computes point representatives, respectively. Subsequently, a first-order transition matrix stores and continuously updates the successive transitions among the clusters. Both the spatial and temporal information are concurrently treated by the Viterbi Algorithm, which maximizes a criterion based upon (a) the temporal transitions and (b) the similarity of the respective input sequence with the cluster representatives. The derived Viterbi path is the node’s output, whereas the concatenation of nine vicinal such paths constitute the input to the corresponding upper level node. The engagement of ART and the Viterbi Algorithm in a Deep learning architecture, here, for the first time, leads to a substantially different approach for action recognition. Compared with other deep learning methodologies, in most cases, it is shown to outperform them, in terms of classification accuracy.

Vorheriger Artikel Fuzzy linguistic induced OWA Minkowski distance operator and its application in group decision making

Nächster Artikel Recursive partitioning clustering tree algorithm

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Please provide reference.

URL http://sparselab.stanford.edu

Bazzani, L., Freitas, N., Larochelle, H., Murino, V., Ting, J.A.: Learning attentional policies for tracking and recognition in video with deep networks. In: International Conference on Machine Learning, pp. 937–944. ACM (2011).

Bellman, R.: Dynamic Programming. Dover Publications (2003).

Bengio Y (2009) Learning deep architectures for AI. Foundations and Trends in Machine Learning 2(1):1–127MathSciNetCrossRefMATH

Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems, pp. 153–160 (2007).

Candès EJ, Romberg JK, Tao T (2006) Stable signal recovery from incomplete and inaccurate measurements. Comm. Pure Appl. Math. 59(8):1207–1223MathSciNetCrossRefMATH

Carpenter G, Grossberg S (1987) A massively parallel architecture for a self-organizing neural pattern recognition machine. Computer Vision Graphics and Image Processing 37(1):54–115CrossRefMATH

Carpenter GA, Gaddam SC (2010) Biased art: A neural architecture that shifts attention toward previously disregarded features following an incorrect prediction. Neural Networks 23(3):435–451CrossRef

10.

Carpenter GA, Grossberg S (1987) Art 2: Self-organization of stable category recognition codes for analog input patterns. Applied Optics 26:4919–4930CrossRef

11.

Carpenter GA, Grossberg S (1990) Art 3: Hierarchical search using chemical transmitters in self-organizing pattern recognition architectures. Neural Networks 3(2):129–152CrossRef

12.

Chaquet JM, Carmona EJ, Fernández-Caballero A (2013) A survey of video datasets for human action and activity recognition. Computer Vision and Image Understanding 117(6):633–659CrossRef

13.

Chen B, Polatkan G, Sapiro G, Blei D, Dunson D, Carin L (2013) Deep learning with hierarchical convolutional factor analysis. Transactions on Pattern Analysis and Machine Intelligence 35(8):1887–1901CrossRef

14.

Chen SS, Donoho DL, Saunders MA (2001) Atomic decomposition by basis pursuit. SIAM 43(1):129–159MathSciNet

15.

Cheng B, Yang J, Yan S, Fu Y, Huang TS (2010) Learning with l1-graph for image analysis. IEEE Transactions on Image Processing 19(4):858–866MathSciNetCrossRef

16.

Chopra, S., Balakrishnan, S., Gopalan, R.: Dlid: Deep learning for domain adaptation by interpolating between domains. In: ICML Workshop on Challenges in Representation Learning (2013).

17.

Denil, M., Shakibi, B., Dinh, L., Ranzato, M., de Freitas, N.: Predicting parameters in deep learning. In: Advances in Neural Information Processing Systems, pp. 2148–2156 (2013).

18.

Diego, F., Hamprecht, F.: Learning multi-level sparse representations. In: Advances in Neural Information Processing Systems, pp. 818–826 (2013).

19.

Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: International Conference on Computer Communications and Networks, pp. 65–72. IEEE (2005).

20.

Donoho DL (2006) For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest solution. Communications on pure and applied mathematics 59(6):797–829MathSciNetCrossRefMATH

21.

Fan J, Xu W, Wu Y, Gong Y (2010) Human tracking using convolutional neural networks. Transactions on Neural Networks 21(10):1610–1623CrossRef

22.

Fazl-Ersi, E., Elder, J., Tsotsos, J.: Hierarchical classifiers for robust topological robot localization. Journal of Intelligent and Robotic Systems: Theory and Applications pp. 1–17 (2012).

23.

George, D.: How the brain might work: a hierarchical and temporal model for learning and recognition. Ph.D. thesis, Stanford, CA, USA (2008).

24.

Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space-time shapes. Transactions on Pattern Analysis and Machine Intelligence 29(12):2247–2253CrossRef

25.

Griffiths TL, Ghahramani Z (2011) The indian buffet process: An introduction and review. Journal of Machine Learning Research 12:1185–1224MathSciNetMATH

26.

Grossberg S (2012) Adaptive resonance theory how a brain learns to consciously attend, learn, and recognize a changing world. Neural Networks 37:1–47CrossRef

27.

Çaglar Gülçehre, Cho, K., Pascanu, R., Bengio, Y.: Learned-norm pooling for deep neural networks (2013).

28.

Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Computation 18(7):1527–1554MathSciNetCrossRefMATH

29.

Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507MathSciNetCrossRefMATH

30.

Hoffman, M.D., Blei, D.M., Bach, F.R.: Online learning for latent dirichlet allocation. In: Advances in Neural Information Processing Systems, pp. 856–864 (2010).

31.

Jain, V., Murray, J.F., Roth, F., Turaga, S., Zhigulin, V., Briggman, K.L., Helmstaedter, M.N., Denk, W., Seung, H.S.: Supervised learning of image restoration with convolutional networks. In: International Conference on Computer Vision, pp. 1–8 (2007).

32.

Jain, V., Seung, H.S.: Natural image denoising with convolutional networks. In: Advances in Neural Information Processing Systems, vol. 8, pp. 769–776. Curran Associates, Inc. (2008).

33.

Jhuang, H., Serre, T., Wolf, L., Poggio, T.: A biologically inspired system for action recognition. In: International Conference on Computer Vision, pp. 1–8. IEEE (2007).

34.

Ji S, Xu W, Yang M, Yu K (2013) 3d convolutional neural networks for human action recognition. Pattern Analysis and Machine Intelligence 35(1):221–231CrossRef

35.

Kavukcuoglu, K., Sermanet, P., Boureau, Y.L., Gregor, K., Mathieu, M., Cun, Y.L.: Learning convolutional feature hierarchies for visual recognition. In: Advances in Neural Information Processing Systems, vol. 1, p. 5 (2010).

36.

Klaser, A., Marszalek, M.: A spatio-temporal descriptor based on 3d-gradients. In: British Machine Vision Conference, pp. 275:1–10 (2008).

37.

Kostavelis I, Gasteratos A (2012) On the optimization of hierarchical temporal memory. Pattern Recognition Letters 33(5):670–676CrossRef

38.

Laptev I (2005) On space-time interest points. International Journal of Computer Vision 64(2):107–123MathSciNetCrossRef

39.

Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008).

40.

Larochelle, H., Hinton, G.E.: Learning to combine foveal glimpses with a third-order boltzmann machine. In: Advances in Neural Information Processing Systems, pp. 1243–1251 (2010).

41.

Le, Q.V., Zou, W.Y., Yeung, S.Y., Ng, A.Y.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: Computer Vision and Pattern Recognition, pp. 3361–3368. IEEE (2011).

42.

LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: pp. 2278–2324. IEEE (1998).

43.

Lee, H., Pham, P.T., Largman, Y., Ng, A.Y.: Unsupervised feature learning for audio classification using convolutional deep belief networks. In: Advances in Neural Information Processing Systems, vol. 9, pp. 1096–1104. Curran Associates, Inc. (2009).

44.

Lee TS, Mumford D, Romero R, Lamme VA (1998) The role of the primary visual cortex in higher level vision. Vision research 38(15–16):2429–2454CrossRef

45.

Lee TSS, Mumford D (2003) Hierarchical bayesian inference in the visual cortex. Journal of the Optical Society of America. A, Optics, image science, and vision 20(7):1434–1448CrossRef

46.

Levine, S.: Exploring deep and recurrent architectures for optimal control (2013).

47.

Liang, P., Klein, D.: Online em for unsupervised models. In: Proceedings of NAACL, pp. 611–619. Association for Computational Linguistics (2009).

48.

Mairal, J., Bach, F., Ponce, J., Sapiro, G., Zisserman, A.. In: Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008).

49.

Marcellin, M.W., Bilgin, A., Gormish, M.J., Boliek, M.P.: An overview of jpeg-2000. p. 523. IEEE (2000).

50.

Memisevic, R.: On multi-view feature learning. In: International Conference on Machine Learning (2012).

51.

Moghaddam, Weiss, Y., Avidan, S.: Spectral bounds for sparse pca: Exact and greedy algorithms. In: Advances in Neural Information Processing Systems, pp. 915–922. MIT Press (2006).

52.

Moghaddam, B., Weiss, Y., Avidan, S.: Generalized spectral bounds for sparse lda. In: International Conference on Machine learning, pp. 641–648. ACM (2006).

53.

Murray JF, Kreutz-Delgado K (2007) Visual recognition and inference using dynamic overcomplete sparse learning. Neural Computation 19(9):2301–2352MathSciNetCrossRefMATH

54.

Niebles JC, Wang H, Fei-Fei L (2008) Unsupervised learning of human action categories using spatial-temporal words. International Journal of Computer Vision 79(3):299–318CrossRef

55.

Norouzi, M., Ranjbar, M., Mori, G.: Stacks of convolutional restricted boltzmann machines for shift-invariant feature learning. In: Computer Vision and Pattern Recognition, pp. 2735–2742. IEEE (2009).

56.

Olshausen, B.A., Fieldt, D.J.: Sparse coding with an overcomplete basis set: a strategy employed by v1. pp. 3311–3325. Elsevier (1997).

57.

Poppe RW (2010) A survey on vision-based human action recognition. Image and Vision Computing 28(6):976–990CrossRef

58.

Qiao L, Chen S, Tan X (2010) Sparsity preserving projections with applications to face recognition. Pattern Recognition 43(1):331–341CrossRefMATH

59.

Ramasso E, Panagiotakis C, Pellerin D, Rombaut M (2008) Human action recognition in videos based on the transferable belief model. Pattern analysis and Applications 11(1):1–19MathSciNetCrossRef

60.

Ranzato, M., Susskind, J., Mnih, V., Hinton, G.: On deep generative models with applications to recognition. In: Computer Vision and Pattern Recognition, pp. 2857–2864. IEEE (2011).

61.

Ranzato, M.A., Huang, F.J., Boureau, Y.L., LeCun, Y.: Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: Computer Vision and Pattern Recognition, vol. 0, pp. 1–8. IEEE, Los Alamitos, CA, USA (2007).

62.

Rodriguez, M.D., Ahmed, J., Shah, M.: Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: Conference on Computer Vision and Pattern Recognition, vol. 0, pp. 1–8. IEEE (2008).

63.

Salakhutdinov R, Tenenbaum JB, Torralba A (2013) Learning with hierarchical-deep models. Transactions on Pattern Analysis and Machine Intelligence 35(8):1958–1971CrossRef

64.

Saxe, A., McClelland, J., Ganguli, S.: Dynamics of learning in deep linear neural networks. In: Deep Learning Workshop, Advances in Neural Information Processing Systems. Curran Associates, Inc. (2013).

65.

Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local svm approach. In: International Conference on Pattern Recognition, vol. 3, pp. 32–36. IEEE (2004).

66.

Serre T, Wolf L, Bileschi S, Riesenhuber M, Poggio T (2007) Robust object recognition with cortex-like mechanisms. Transactions on Pattern Analysis and Machine Intelligence 29(3):411–426CrossRef

67.

Srivastava, N., Salakhutdinov, R.: Discriminative transfer learning with tree-based priors. In: Advances in Neural Information Processing Systems, pp. 2094–2102. Curran Associates, Inc. (2013).

68.

Tang, Y.: Deep learning using linear support vector machines. In: Workshop on Challenges in Representation Learning, ICML (2013).

69.

Tang, Y., Eliasmith, C.: Deep networks for robust visual recognition. In: International Conference on Machine Learning, pp. 1055–1062 (2010).

70.

Tang, Y., Salakhutdinov, R.: Learning stochastic feedforward neural networks. In: Advances in Neural Information Processing Systems, pp. 530–538. Curran Associates, Inc. (2013).

71.

Taylor, G., Fergus, R., LeCun, Y., Bregler, C.: Convolutional learning of spatio-temporal features. European Conference on Computer Vision pp. 140–153 (2010).

72.

Theodoridis, S., Koutroumbas, K.: Pattern Recognition, Fourth Edition, 4th edn. Academic Press (2008).

73.

W, L., H, Z., D, T., Y, W., K, L.: Large-scale paralleled sparse principal component analysis. CoRR abs/1312.6182 (2013).

74.

Wang, H., Ullah, M.M., Klaser, A., Laptev, I., Schmid, C., et al.: Evaluation of local spatio-temporal features for action recognition. In: British Machine Vision Conference, pp. 124.1-124.11 (2009).

75.

Welling, M., Rosen-Zvi, M., Hinton, G.: Exponential family harmoniums with an application to information retrieval. In: Advances in Neural Information Processing Systems, pp. 1481–1488. MIT Press (2005).

76.

Willems, G., Tuytelaars, T., Van Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: European Conference of Computer Vision, pp. 650–663. Springer (2008).

77.

Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2009) Robust face recognition via sparse representation. Transactions on Pattern Analysis and Machine Intelligence 31(2):210–227CrossRef

78.

Yang J, Zhang L, Xu Y, Yang JY (2012) Beyond sparsity: The role of l1-optimizer in pattern classification. Pattern Recognition 45(3):1104–1118CrossRefMATH

79.

Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: Computer Vision and Pattern Recognition, pp. 2528–2535. IEEE (2010).

80.

Zhang, L., Zhou, W.D., Li, F.Z.: Kernel sparse representation-based classifier ensemble for face recognition. Multimedia Tools and Applications pp. 1–15 (2013). DOI 10.1007/s11042-013-1457-1

81.

Zhang, Z., Wang, C., Xiao, B., Zhou, W., Liu, S.: Robust relative attributes for human action recognition. Pattern Analysis and Applications pp. 1–15 (2013).

82.

Zhou G, Sohn K, Lee H (2012) Online incremental feature learning with denoising autoencoders. Journal of Machine Learning Research 22:1453–1461

83.

Zhou Y, Liu K, Carrillo RE, Barner KE, Kiamilev F (2013) Kernel-based sparse representation for gesture recognition. Pattern Recognition 46(12):3208–3222CrossRefMATH

84.

Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. Journal of Computational and Graphical Statistics 15:265–286MathSciNetCrossRef

Titel: On-line deep learning method for action recognition
verfasst von: Konstantinos Charalampous
Antonios Gasteratos
Publikationsdatum: 01.05.2016
Verlag: Springer London
Erschienen in: Pattern Analysis and Applications / Ausgabe 2/2016
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI: https://doi.org/10.1007/s10044-014-0404-8

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2016

Unsupervised joint face alignment with gradient correlation coefficient

An investigation of implicit features in compression-based learning for comparing webpages

Periocular recognition: how much facial expressions affect performance?

Automated hippocampal segmentation in 3D MRI using random undersampling with boosting algorithm

An efficient algorithm for large-scale quasi-supervised learning

Using latent features for short-term person re-identification with RGB-D cameras