Skip to main content
Erschienen in: International Journal of Computer Vision 2/2014

01.04.2014

Max-Margin Early Event Detectors

verfasst von: Minh Hoai, Fernando De la Torre

Erschienen in: International Journal of Computer Vision | Ausgabe 2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The need for early detection of temporal events from sequential data arises in a wide spectrum of applications ranging from human-robot interaction to video security. While temporal event detection has been extensively studied, early detection is a relatively unexplored problem. This paper proposes a maximum-margin framework for training temporal event detectors to recognize partial events, enabling early detection. Our method is based on Structured Output SVM, but extends it to accommodate sequential data. Experiments on datasets of varying complexity, for detecting facial expressions, hand gestures, and human activities, demonstrate the benefits of our approach.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Ali, S., & Shah, M. (2010). Human action recognition in videos using kinematic features and multiple instance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(2), 288–303.CrossRef Ali, S., & Shah, M. (2010). Human action recognition in videos using kinematic features and multiple instance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(2), 288–303.CrossRef
Zurück zum Zitat Amer, M. R., Xie, D., Zhao, M., Todorovic, S., & Zhu, S. C. (2012). Cost-sensitive top-down/bottom-up inference for multiscale activity recognition. In Proceedings of the european conference on computer vision. Amer, M. R., Xie, D., Zhao, M., Todorovic, S., & Zhu, S. C. (2012). Cost-sensitive top-down/bottom-up inference for multiscale activity recognition. In Proceedings of the european conference on computer vision.
Zurück zum Zitat Bobick, A. F., & Davis, J. W. (2001). The recognition of human movement using temporal templates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(3), 257–267.CrossRef Bobick, A. F., & Davis, J. W. (2001). The recognition of human movement using temporal templates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(3), 257–267.CrossRef
Zurück zum Zitat Brand, M., Oliver, N., & Pentland, A. (1997). Coupled hidden Markov models for complex action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. Brand, M., Oliver, N., & Pentland, A. (1997). Coupled hidden Markov models for complex action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Brendel, W., & Todorovic, S. (2011). Learning spatiotemporal graphs of human activities. In Proceedings of the international conference on computer vision. Brendel, W., & Todorovic, S. (2011). Learning spatiotemporal graphs of human activities. In Proceedings of the international conference on computer vision.
Zurück zum Zitat Brown, P. F., deSouza, P. V., Mercer, R. L., Pietra, V. J. D., & Lai, J. C. (1992). Class-based n-gram models of natural language. Computational Linguistics, 18(4), 467–479. Brown, P. F., deSouza, P. V., Mercer, R. L., Pietra, V. J. D., & Lai, J. C. (1992). Class-based n-gram models of natural language. Computational Linguistics, 18(4), 467–479.
Zurück zum Zitat Chomat, O., & Crowley, J. (1999). Probabilistic recognition of activity using local appearance. In Proceedings of the IEEE conference on computer vision and pattern recognition. Chomat, O., & Crowley, J. (1999). Probabilistic recognition of activity using local appearance. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Cohn, J., Simon, T., Matthews, I., Yang, Y., Nguyen, M. H., Tejera, M., Zhou, F., & De la Torre, F. (2009). Detecting depression from facial actions and vocal prosody. In Proceedings of international conference on affective computing and intelligent interaction. Cohn, J., Simon, T., Matthews, I., Yang, Y., Nguyen, M. H., Tejera, M., Zhou, F., & De la Torre, F. (2009). Detecting depression from facial actions and vocal prosody. In Proceedings of international conference on affective computing and intelligent interaction.
Zurück zum Zitat Cooper, H., & Bowden, R. (2009). Learning signs from subtitles: A weakly supervised approach to sign language recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. Cooper, H., & Bowden, R. (2009). Learning signs from subtitles: A weakly supervised approach to sign language recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Crammer, K., & Singer, Y. (2001). On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research, 2, 265–292. Crammer, K., & Singer, Y. (2001). On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research, 2, 265–292.
Zurück zum Zitat Davis, J., & Tyagi, A. (2006). Minimal-latency human action recognition using reliable-inference. Image and Vision Computing, 24(5), 455–472.CrossRef Davis, J., & Tyagi, A. (2006). Minimal-latency human action recognition using reliable-inference. Image and Vision Computing, 24(5), 455–472.CrossRef
Zurück zum Zitat Desobry, F., Davy, M., & Doncarli, C. (2005). An online kernel change detection algorithm. IEEE Transaction on Signal Processing, 53(8), 2961–2974.CrossRefMathSciNet Desobry, F., Davy, M., & Doncarli, C. (2005). An online kernel change detection algorithm. IEEE Transaction on Signal Processing, 53(8), 2961–2974.CrossRefMathSciNet
Zurück zum Zitat Dollár, P., Rabaud, V., Cottrell, G., & Belongie, S. (2005). Behavior recognition via sparse spatio-temporal features. In ICCV Workshop on visual surveillance and performance evaluation of tracking and surveillance. Dollár, P., Rabaud, V., Cottrell, G., & Belongie, S. (2005). Behavior recognition via sparse spatio-temporal features. In ICCV Workshop on visual surveillance and performance evaluation of tracking and surveillance.
Zurück zum Zitat Duchenne, O., Laptev, I., Sivic, J., Bach, F. R., & Ponce, J. (2009). Automatic annotation of human actions in video. In Proceedings of the international conference on computer vision. Duchenne, O., Laptev, I., Sivic, J., Bach, F. R., & Ponce, J. (2009). Automatic annotation of human actions in video. In Proceedings of the international conference on computer vision.
Zurück zum Zitat Efros, A., Berg, A., Mori, G., & Malik, J. (2003). Recognizing action at a distance. In Proceedings of the international conference on computer vision. Efros, A., Berg, A., Mori, G., & Malik, J. (2003). Recognizing action at a distance. In Proceedings of the international conference on computer vision.
Zurück zum Zitat Ellis, C., Masood, S., Tappen, M. F., LaViola, J. J., & Sukthankar, R. (2013). Exploring the trade-off between accuracy and observational latency in action recognition. International Journal of Computer Vision, 101(3), 420–436.CrossRef Ellis, C., Masood, S., Tappen, M. F., LaViola, J. J., & Sukthankar, R. (2013). Exploring the trade-off between accuracy and observational latency in action recognition. International Journal of Computer Vision, 101(3), 420–436.CrossRef
Zurück zum Zitat Fawcett, T., & Provost, F. (1999). Activity monitoring: Noticing interesting changes in behavior. In Proceedings of the SIGKDD conference on knowledge discovery and data mining. Fawcett, T., & Provost, F. (1999). Activity monitoring: Noticing interesting changes in behavior. In Proceedings of the SIGKDD conference on knowledge discovery and data mining.
Zurück zum Zitat Gorelick, L., Blank, M., Shechtman, E., Irani, M., & Basri, R. (2007). Actions as space-time shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(12), 2247–2253.CrossRef Gorelick, L., Blank, M., Shechtman, E., Irani, M., & Basri, R. (2007). Actions as space-time shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(12), 2247–2253.CrossRef
Zurück zum Zitat Haider, P., Brefeld, U., & Scheffer, T. (2007). Supervised clustering of streaming data for email batch detection. In Proceedings of the international conference on machine learning. Haider, P., Brefeld, U., & Scheffer, T. (2007). Supervised clustering of streaming data for email batch detection. In Proceedings of the international conference on machine learning.
Zurück zum Zitat Hoai, M., & De la Torre, F. (2012a). Max-margin early event detectors. In Proceedings of the IEEE conference on computer vision and pattern recognition. Hoai, M., & De la Torre, F. (2012a). Max-margin early event detectors. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Hoai, M., & De la Torre, F. (2012b). Maximum margin temporal clustering. In Proceedings of international conference on artificial intelligence and statistics. Hoai, M., & De la Torre, F. (2012b). Maximum margin temporal clustering. In Proceedings of international conference on artificial intelligence and statistics.
Zurück zum Zitat Hoai, M., Lan, Z. Z., & De la Torre, F. (2011). Joint segmentation and classification of human actions in video. In Proceedings of the IEEE conference on computer vision and pattern recognition. Hoai, M., Lan, Z. Z., & De la Torre, F. (2011). Joint segmentation and classification of human actions in video. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Jhuang, H., Serre, T., Wolf, L., & Poggio, T. (2007). A biologically inspired system for action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. Jhuang, H., Serre, T., Wolf, L., & Poggio, T. (2007). A biologically inspired system for action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Kadous, M. (2002). Temporal classification: Extending the classification paradigm to multivariate time series. PhD thesis, The University of New South Wales. Kadous, M. (2002). Temporal classification: Extending the classification paradigm to multivariate time series. PhD thesis, The University of New South Wales.
Zurück zum Zitat Ke, Y., Sukthankar, R., & Hebert, M. (2005). Efficient visual event detection using volumetric features. In Proceedings of the international conference on computer vision. Ke, Y., Sukthankar, R., & Hebert, M. (2005). Efficient visual event detection using volumetric features. In Proceedings of the international conference on computer vision.
Zurück zum Zitat Kim, K. J. (2003). Financial time series forecasting using support vector machines. Neurocomputing, 55(1–2), 307–319.CrossRef Kim, K. J. (2003). Financial time series forecasting using support vector machines. Neurocomputing, 55(1–2), 307–319.CrossRef
Zurück zum Zitat Klaser, A., Marszalek, M., Schmid, C., & Zisserman, A. (2010). Human focused action localization in video. In Proceedings of international workshop on sign, gesture, activity. Klaser, A., Marszalek, M., Schmid, C., & Zisserman, A. (2010). Human focused action localization in video. In Proceedings of international workshop on sign, gesture, activity.
Zurück zum Zitat Lan, T., Wang, Y., & Mori, G. (2011). Discriminative figure-centric models for joint action localization and recognition. In Proceedings of the international conference on computer vision. Lan, T., Wang, Y., & Mori, G. (2011). Discriminative figure-centric models for joint action localization and recognition. In Proceedings of the international conference on computer vision.
Zurück zum Zitat Le, Q. V., Sarlos, T., & Smola, A. (2013). Fastfood—approximating kernel expansions in loglinear time. In Proceedings of the international conference on machine learning. Le, Q. V., Sarlos, T., & Smola, A. (2013). Fastfood—approximating kernel expansions in loglinear time. In Proceedings of the international conference on machine learning.
Zurück zum Zitat Liu, J., Kuipers, B., & Savarese, S. (2011). Recognizing human actions by attributes. In Proceedings of the IEEE conference on computer vision and pattern recognition. Liu, J., Kuipers, B., & Savarese, S. (2011). Recognizing human actions by attributes. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010). The extended Cohn–Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In CVPR Workshop on human communicative behavior analysis. Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010). The extended Cohn–Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In CVPR Workshop on human communicative behavior analysis.
Zurück zum Zitat Maji, S., & Berg, A. C. (2009). Max-margin additive classifiers for detection. In Proceedings of the international conference on computer vision. Maji, S., & Berg, A. C. (2009). Max-margin additive classifiers for detection. In Proceedings of the international conference on computer vision.
Zurück zum Zitat Marin-Jiménez, M. J., Zisserman, A., & Ferrari, V. (2011). “Here’s looking at you, kid”. Detecting people looking at each other in videos. In Proceedings of the British machine vision conference. Marin-Jiménez, M. J., Zisserman, A., & Ferrari, V. (2011). “Here’s looking at you, kid”. Detecting people looking at each other in videos. In Proceedings of the British machine vision conference.
Zurück zum Zitat Masood, S., Ellis, C., Nagaraja, A., & Tappen, M. (2011). Measuring and reducing observational latency when recognizing actions. In Proceedings of the international conference on computer vision. Masood, S., Ellis, C., Nagaraja, A., & Tappen, M. (2011). Measuring and reducing observational latency when recognizing actions. In Proceedings of the international conference on computer vision.
Zurück zum Zitat Mauthner, T., Roth, P., & Bischof, H. (2009). Action recognition from a small number of frames. In Computer vision winter workshop. Mauthner, T., Roth, P., & Bischof, H. (2009). Action recognition from a small number of frames. In Computer vision winter workshop.
Zurück zum Zitat Nam, Y., Wohn, K., & Lee-Kwang, H. (1999). Modeling and recognition of hand gesture using colored petri nets. IEEE Transactions on Systems, Man and Cybernetics, 29(5), 514–521.CrossRef Nam, Y., Wohn, K., & Lee-Kwang, H. (1999). Modeling and recognition of hand gesture using colored petri nets. IEEE Transactions on Systems, Man and Cybernetics, 29(5), 514–521.CrossRef
Zurück zum Zitat Neill, D., Moore, A., & Cooper, G. (2006). A bayesian spatial scan statistic. In Advances in neural information processing systems. Neill, D., Moore, A., & Cooper, G. (2006). A bayesian spatial scan statistic. In Advances in neural information processing systems.
Zurück zum Zitat Nguyen, M. H., Torresani, L., De la Torre, F., & Rother, C. (2009). Weakly supervised discriminative localization and classification: A joint learning process. In Proceedings of the international conference on computer vision. Nguyen, M. H., Torresani, L., De la Torre, F., & Rother, C. (2009). Weakly supervised discriminative localization and classification: A joint learning process. In Proceedings of the international conference on computer vision.
Zurück zum Zitat Nguyen, M. H., Simon, T., De la Torre, F., & Cohn, J. (2010). Action unit detection with segment-based SVMs. In Proceedings of the IEEE conference on computer vision and pattern recognition. Nguyen, M. H., Simon, T., De la Torre, F., & Cohn, J. (2010). Action unit detection with segment-based SVMs. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Niebles, J. C., Chen, C. W., & Fei-Fei, L. (2010). Modeling temporal structure of decomposable motion segments for activity classification. In Proceedings of the european conference on computer vision. Niebles, J. C., Chen, C. W., & Fei-Fei, L. (2010). Modeling temporal structure of decomposable motion segments for activity classification. In Proceedings of the european conference on computer vision.
Zurück zum Zitat Nowozin, S., & Shotton, J. (2012). Action points: A representation for low-latency online human action recognition. Microsoft Research Technical Report MSR-TR-2012-68, Cambridge. Nowozin, S., & Shotton, J. (2012). Action points: A representation for low-latency online human action recognition. Microsoft Research Technical Report MSR-TR-2012-68, Cambridge.
Zurück zum Zitat Oh, S. M., Rehg, J. M., Balch, T., & Dellaert, F. (2008). Learning and inferring motion patterns using parametric segmental switching linear dynamic systems. International Journal of Computer Vision, 77(1–3), 103–124.CrossRef Oh, S. M., Rehg, J. M., Balch, T., & Dellaert, F. (2008). Learning and inferring motion patterns using parametric segmental switching linear dynamic systems. International Journal of Computer Vision, 77(1–3), 103–124.CrossRef
Zurück zum Zitat Parameswaran, V., & Chellappa, R. (2006). View invariance for human action recognition. International Journal of Computer Vision, 66(1), 83–101.CrossRef Parameswaran, V., & Chellappa, R. (2006). View invariance for human action recognition. International Journal of Computer Vision, 66(1), 83–101.CrossRef
Zurück zum Zitat Patron-Perez, A., Marszalek, M., Zisserman, A., & Reid, I. (2010). High five: Recognising human interactions in TV shows. In Proceedings of British machine vision conference. Patron-Perez, A., Marszalek, M., Zisserman, A., & Reid, I. (2010). High five: Recognising human interactions in TV shows. In Proceedings of British machine vision conference.
Zurück zum Zitat Pei, M., Jia, Y., & Zhu, S. C. (2011). Parsing video events with goal inference and intent prediction. In Proceedings of the international conference on computer vision. Pei, M., Jia, Y., & Zhu, S. C. (2011). Parsing video events with goal inference and intent prediction. In Proceedings of the international conference on computer vision.
Zurück zum Zitat Reddy, K. K., & Shah, M. (2012). Recognizing 50 human action categories of web videos. Machine Vision and Applications, 24(5), 971–981.CrossRef Reddy, K. K., & Shah, M. (2012). Recognizing 50 human action categories of web videos. Machine Vision and Applications, 24(5), 971–981.CrossRef
Zurück zum Zitat Ryoo, M. (2011). Human activity prediction: Early recognition of ongoing activities from streaming videos. In Proceedings of the international conference on computer vision. Ryoo, M. (2011). Human activity prediction: Early recognition of ongoing activities from streaming videos. In Proceedings of the international conference on computer vision.
Zurück zum Zitat Ryoo, M. S., & Aggarwal, J. K. (2009). Semantic representation and recognition of continued and recursive human activities. International Journal of Computer Vision, 32(1), 1–24.CrossRef Ryoo, M. S., & Aggarwal, J. K. (2009). Semantic representation and recognition of continued and recursive human activities. International Journal of Computer Vision, 32(1), 1–24.CrossRef
Zurück zum Zitat Satkin, S., & Hebert, M. (2010). Modeling the temporal extent of actions. In Proceedings of the european conference on computer vision. Satkin, S., & Hebert, M. (2010). Modeling the temporal extent of actions. In Proceedings of the european conference on computer vision.
Zurück zum Zitat Schindler, K., & Van Gool, L. (2008). Action snippets: How many frames does human action recognition require? In Proceedings of the IEEE conference on computer vision and pattern recognition. Schindler, K., & Van Gool, L. (2008). Action snippets: How many frames does human action recognition require? In Proceedings of the IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Shi, Y., Nguyen, M. H., Blitz, P., French, B., Fisk, S., De la Torre, F., Smailagic, A., & Siewiorek, D. (2010). Personalized stress detection from physiological measurements. In International symposium on quality of life technology. Shi, Y., Nguyen, M. H., Blitz, P., French, B., Fisk, S., De la Torre, F., Smailagic, A., & Siewiorek, D. (2010). Personalized stress detection from physiological measurements. In International symposium on quality of life technology.
Zurück zum Zitat Smith, P., da Vitoria Lobo, N., & Shah, M. (2005). Temporal boost for event recognition. In Proceedings of the international conference on computer vision. Smith, P., da Vitoria Lobo, N., & Shah, M. (2005). Temporal boost for event recognition. In Proceedings of the international conference on computer vision.
Zurück zum Zitat Taskar, B., Guestrin, C., & Koller, D. (2003). Max-margin Markov networks. In Advances in neural information processing systems. Taskar, B., Guestrin, C., & Koller, D. (2003). Max-margin Markov networks. In Advances in neural information processing systems.
Zurück zum Zitat Tran, S. D., & Davis, L. S. (2008). Event modeling and recognition using Markov logic networks. In Proceedings of the european conference on computer vision. Tran, S. D., & Davis, L. S. (2008). Event modeling and recognition using Markov logic networks. In Proceedings of the european conference on computer vision.
Zurück zum Zitat Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6, 1453–1484.MATHMathSciNet Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6, 1453–1484.MATHMathSciNet
Zurück zum Zitat Vedaldi, A., & Zisserman, A. (2009). Structured output regression for detection with partial truncation. In Advances in neural information processing systems. Vedaldi, A., & Zisserman, A. (2009). Structured output regression for detection with partial truncation. In Advances in neural information processing systems.
Zurück zum Zitat Vedaldi, A., & Zisserman, A. (2010). Efficient additive kernels via explicit feature maps. In Proceedings of the IEEE conference on computer vision and pattern recognition. Vedaldi, A., & Zisserman, A. (2010). Efficient additive kernels via explicit feature maps. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Yacoob, Y., & Black, M. J. (1999). Parameterized modeling and recognition of activities. Computer Vision and Image Understanding, 73(2), 232–247.CrossRef Yacoob, Y., & Black, M. J. (1999). Parameterized modeling and recognition of activities. Computer Vision and Image Understanding, 73(2), 232–247.CrossRef
Zurück zum Zitat Yang, Y., & Shah, M. (2012). Complex events detection using data-driven concepts. In Proceedings of the european conference on computer vision. Yang, Y., & Shah, M. (2012). Complex events detection using data-driven concepts. In Proceedings of the european conference on computer vision.
Metadaten
Titel
Max-Margin Early Event Detectors
verfasst von
Minh Hoai
Fernando De la Torre
Publikationsdatum
01.04.2014
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 2/2014
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-013-0683-3

Weitere Artikel der Ausgabe 2/2014

International Journal of Computer Vision 2/2014 Zur Ausgabe