Skip to main content

2016 | OriginalPaper | Buchkapitel

Joint Recognition and Segmentation of Actions via Probabilistic Integration of Spatio-Temporal Fisher Vectors

verfasst von : Johanna Carvajal, Chris McCool, Brian Lovell, Conrad Sanderson

Erschienen in: Trends and Applications in Knowledge Discovery and Data Mining

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We propose a hierarchical approach to multi-action recognition that performs joint classification and segmentation. A given video (containing several consecutive actions) is processed via a sequence of overlapping temporal windows. Each frame in a temporal window is represented through selective low-level spatio-temporal features which efficiently capture relevant local dynamics. Features from each window are represented as a Fisher vector, which captures first and second order statistics. Instead of directly classifying each Fisher vector, it is converted into a vector of class probabilities. The final classification decision for each frame is then obtained by integrating the class probabilities at the frame level, which exploits the overlapping of the temporal windows. Experiments were performed on two datasets: s-KTH (a stitched version of the KTH dataset to simulate multi-actions), and the challenging CMU-MMAC dataset. On s-KTH, the proposed approach achieves an accuracy of 85.0 %, significantly outperforming two recent approaches based on GMMs and HMMs which obtained 78.3 % and 71.2 %, respectively. On CMU-MMAC, the proposed approach achieves an accuracy of 40.9 %, outperforming the GMM and HMM approaches which obtained 33.7 % and 38.4 %, respectively. Furthermore, the proposed system is on average 40 times faster than the GMM based approach.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Buchsbaum, D., Canini, K.R., Griffiths, T.: Segmenting and recognizing human action using low-level video features. In: Annual Conference of the Cognitive Science Society (2011) Buchsbaum, D., Canini, K.R., Griffiths, T.: Segmenting and recognizing human action using low-level video features. In: Annual Conference of the Cognitive Science Society (2011)
2.
Zurück zum Zitat Hoai, M., Lan, Z.Z., De la Torre, F.: Joint segmentation and classification of human actions in video. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3265–3272 (2011) Hoai, M., Lan, Z.Z., De la Torre, F.: Joint segmentation and classification of human actions in video. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3265–3272 (2011)
3.
Zurück zum Zitat Shi, Q., Wang, L., Cheng, L., Smola, A.: Discriminative human action segmentation and recognition using semi-Markov model. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2008) Shi, Q., Wang, L., Cheng, L., Smola, A.: Discriminative human action segmentation and recognition using semi-Markov model. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2008)
4.
Zurück zum Zitat Cheng, Y., Fan, Q., Pankanti, S., Choudhary, A.: Temporal sequence modeling for video event detection. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2235–2242 (2014) Cheng, Y., Fan, Q., Pankanti, S., Choudhary, A.: Temporal sequence modeling for video event detection. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2235–2242 (2014)
5.
Zurück zum Zitat Borzeshi, E., Perez Concha, O., Xu, R., Piccardi, M.: Joint action segmentation and classification by an extended hidden Markov model. IEEE Sig. Process. Lett. 20, 1207–1210 (2013)CrossRef Borzeshi, E., Perez Concha, O., Xu, R., Piccardi, M.: Joint action segmentation and classification by an extended hidden Markov model. IEEE Sig. Process. Lett. 20, 1207–1210 (2013)CrossRef
7.
Zurück zum Zitat Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. Adv. Neural Inf. Process. Syst. 11, 487–493 (1998) Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. Adv. Neural Inf. Process. Syst. 11, 487–493 (1998)
8.
Zurück zum Zitat Lasserre, J., Bishop, C.M.: Generative or discriminative? Getting the best of both worlds. In: Bernardo, J., Bayarri, M., Berger, J., Dawid, A., Heckerman, D., Smith, A., West, M. (eds.) Bayesian Statistics, vol. 8, pp. 3–24. Oxford University Press, Oxford (2007) Lasserre, J., Bishop, C.M.: Generative or discriminative? Getting the best of both worlds. In: Bernardo, J., Bayarri, M., Berger, J., Dawid, A., Heckerman, D., Smith, A., West, M. (eds.) Bayesian Statistics, vol. 8, pp. 3–24. Oxford University Press, Oxford (2007)
9.
Zurück zum Zitat Csurka, G., Perronnin, F.: Fisher vectors: beyond bag-of-visual-words image representations. In: Richard, P., Braz, J. (eds.) VISIGRAPP 2010. CCIS, vol. 229, pp. 28–42. Springer, Heidelberg (2011) Csurka, G., Perronnin, F.: Fisher vectors: beyond bag-of-visual-words image representations. In: Richard, P., Braz, J. (eds.) VISIGRAPP 2010. CCIS, vol. 229, pp. 28–42. Springer, Heidelberg (2011)
10.
Zurück zum Zitat Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the Fisher vector: theory and practice. Int. J. Comput. Vis. 105, 222–245 (2013)MathSciNetCrossRefMATH Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the Fisher vector: theory and practice. Int. J. Comput. Vis. 105, 222–245 (2013)MathSciNetCrossRefMATH
11.
Zurück zum Zitat Wang, H., Ullah, M.M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. Br. Mach. Vis. Conf. (BMVC) 124(1–124), 11 (2009) Wang, H., Ullah, M.M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. Br. Mach. Vis. Conf. (BMVC) 124(1–124), 11 (2009)
12.
Zurück zum Zitat Oneata, D., Verbeek, J., Schmid, C.: Action and event recognition with Fisher vectors on a compact feature set. In: International Conference on Computer Vision (ICCV), pp. 1817–1824 (2013) Oneata, D., Verbeek, J., Schmid, C.: Action and event recognition with Fisher vectors on a compact feature set. In: International Conference on Computer Vision (ICCV), pp. 1817–1824 (2013)
13.
Zurück zum Zitat Wang, H., Schmid, C.: Action recognition with improved trajectories. In: International Conference on Computer Vision (ICCV) (2013) Wang, H., Schmid, C.: Action recognition with improved trajectories. In: International Conference on Computer Vision (ICCV) (2013)
15.
Zurück zum Zitat Cao, L., Tian, Y., Liu, Z., Yao, B., Zhang, Z., Huang, T.: Action detection using multiple spatial-temporal interest point features. In: International Conference on Multimedia and Expo (ICME), pp. 340–345 (2010) Cao, L., Tian, Y., Liu, Z., Yao, B., Zhang, Z., Huang, T.: Action detection using multiple spatial-temporal interest point features. In: International Conference on Multimedia and Expo (ICME), pp. 340–345 (2010)
16.
Zurück zum Zitat Kliper-Gross, O., Gurovich, Y., Hassner, T., Wolf, L.: Motion interchange patterns for action recognition in unconstrained videos. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 256–269. Springer, Heidelberg (2012)CrossRef Kliper-Gross, O., Gurovich, Y., Hassner, T., Wolf, L.: Motion interchange patterns for action recognition in unconstrained videos. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 256–269. Springer, Heidelberg (2012)CrossRef
17.
Zurück zum Zitat Ali, S., Shah, M.: Human action recognition in videos using kinematic features and multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 32, 288–303 (2010)CrossRef Ali, S., Shah, M.: Human action recognition in videos using kinematic features and multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 32, 288–303 (2010)CrossRef
18.
Zurück zum Zitat Guo, K., Ishwar, P., Konrad, J.: Action recognition from video using feature covariance matrices. IEEE Trans. Image Process. 22, 2479–2494 (2013)MathSciNetCrossRef Guo, K., Ishwar, P., Konrad, J.: Action recognition from video using feature covariance matrices. IEEE Trans. Image Process. 22, 2479–2494 (2013)MathSciNetCrossRef
19.
Zurück zum Zitat Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)MATH Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)MATH
20.
Zurück zum Zitat Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2, 265–292 (2001)MATH Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2, 265–292 (2001)MATH
21.
Zurück zum Zitat Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classifiers 10, 61–74 (1999) Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classifiers 10, 61–74 (1999)
22.
Zurück zum Zitat Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. Int. Conf. Pattern Recogn. (ICPR) 3, 32–36 (2004) Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. Int. Conf. Pattern Recogn. (ICPR) 3, 32–36 (2004)
23.
Zurück zum Zitat De la Torre, F., Hodgins, J.K., Montano, J., Valcarcel, S.: Detailed human data acquisition of kitchen activities: the CMU-multimodal activity database (CMU-MMAC). In: CHI Workshop on Developing Shared Home Behavior Datasets to Advance HCI and Ubiquitous Computing Research (2009) De la Torre, F., Hodgins, J.K., Montano, J., Valcarcel, S.: Detailed human data acquisition of kitchen activities: the CMU-multimodal activity database (CMU-MMAC). In: CHI Workshop on Developing Shared Home Behavior Datasets to Advance HCI and Ubiquitous Computing Research (2009)
24.
Zurück zum Zitat Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., Baskurt, A.: Sequential deep learning for human action recognition. In: Salah, A.A., Lepri, B. (eds.) HBU 2011. LNCS, vol. 7065, pp. 29–39. Springer, Heidelberg (2011) Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., Baskurt, A.: Sequential deep learning for human action recognition. In: Salah, A.A., Lepri, B. (eds.) HBU 2011. LNCS, vol. 7065, pp. 29–39. Springer, Heidelberg (2011)
25.
Zurück zum Zitat Spriggs, E.H., Torre, F.D.L., Hebert, M.: Temporal segmentation and activity classification from first-person sensing. In: IEEE Workshop on Egocentric Vision, CVPR (2009) Spriggs, E.H., Torre, F.D.L., Hebert, M.: Temporal segmentation and activity classification from first-person sensing. In: IEEE Workshop on Egocentric Vision, CVPR (2009)
26.
Zurück zum Zitat Simonyan, K., Vedaldi, A., Zisserman, A.: Deep Fisher networks for large-scale image classification. In: Burges, C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 163–171 (2013) Simonyan, K., Vedaldi, A., Zisserman, A.: Deep Fisher networks for large-scale image classification. In: Burges, C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 163–171 (2013)
27.
Zurück zum Zitat Parkhi, O.M., Simonyan, K., Vedaldi, A., Zisserman, A.: A compact and discriminative face track descriptor. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014) Parkhi, O.M., Simonyan, K., Vedaldi, A., Zisserman, A.: A compact and discriminative face track descriptor. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Metadaten
Titel
Joint Recognition and Segmentation of Actions via Probabilistic Integration of Spatio-Temporal Fisher Vectors
verfasst von
Johanna Carvajal
Chris McCool
Brian Lovell
Conrad Sanderson
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-42996-0_10