Skip to main content
Erschienen in: International Journal of Computer Vision 3/2017

09.02.2017

A Branch-and-Bound Framework for Unsupervised Common Event Discovery

verfasst von: Wen-Sheng Chu, Fernando De la Torre, Jeffrey F. Cohn, Daniel S. Messinger

Erschienen in: International Journal of Computer Vision | Ausgabe 3/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Event discovery aims to discover a temporal segment of interest, such as human behavior, actions or activities. Most approaches to event discovery within or between time series use supervised learning. This becomes problematic when relevant event labels are unknown, are difficult to detect, or not all possible combinations of events have been anticipated. To overcome these problems, this paper explores Common Event Discovery (CED), a new problem that aims to discover common events of variable-length segments in an unsupervised manner. A potential solution to CED is searching over all possible pairs of segments, which would incur a prohibitive quartic cost. In this paper, we propose an efficient branch-and-bound (B&B) framework that avoids exhaustive search while guaranteeing a globally optimal solution. To this end, we derive novel bounding functions for various commonality measures and provide extensions to multiple commonality discovery and accelerated search. The B&B framework takes as input any multidimensional signal that can be quantified into histograms. A generalization of the framework can be readily applied to discover events at the same or different times (synchrony and event commonality, respectively). We consider extensions to video search and supervised event detection. The effectiveness of the B&B framework is evaluated in motion capture of deliberate behavior and in video of spontaneous facial behavior in diverse interpersonal contexts: interviews, small groups of young adults, and parent-infant face-to-face interaction.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Bold capital letters denote a matrix \(\mathbf {X}\), bold lower-case letters a column vector \(\mathbf {x}\). \(\mathbf {x}_i\) represents the ith column of the matrix \(\mathbf {X}\). \(x_{ij}\) denotes the scalar in the ith row and jth column of the matrix \(\mathbf {X}\). All non-bold letters represent scalars.
 
Literatur
Zurück zum Zitat Amberg, B., & Vetter, T. (2011). Optimal landmark detection using shape models and branch and bound. In ICCV. Amberg, B., & Vetter, T. (2011). Optimal landmark detection using shape models and branch and bound. In ICCV.
Zurück zum Zitat Balakrishnan, V., Boyd, S., & Balemi, S. (1991). Branch and bound algorithm for computing the minimum stability degree of parameter-dependent linear systems. International Journal of Robust and Nonlinear Control, 1(4), 295–317.CrossRefMATH Balakrishnan, V., Boyd, S., & Balemi, S. (1991). Branch and bound algorithm for computing the minimum stability degree of parameter-dependent linear systems. International Journal of Robust and Nonlinear Control, 1(4), 295–317.CrossRefMATH
Zurück zum Zitat Barbič, J., Safonova, A., Pan, J. Y., Faloutsos, C., Hodgins, J. K. & Pollard, N. S. (2004). Segmenting motion capture data into distinct behaviors. In Proceedings of Graphics Interface 2004 (pp. 185–194). Canadian Human-Computer Communications Society. Barbič, J., Safonova, A., Pan, J. Y., Faloutsos, C., Hodgins, J. K. & Pollard, N. S. (2004). Segmenting motion capture data into distinct behaviors. In Proceedings of Graphics Interface 2004 (pp. 185–194). Canadian Human-Computer Communications Society.
Zurück zum Zitat Bartlett, M. S., Littlewort, G. C., Frank, M. G., Lainscsek, C., Fasel, I. R., & Movellan, J. R. (2006). Automatic recognition of facial actions in spontaneous expressions. Journal of Multimedia, 1(6), 22–35.CrossRef Bartlett, M. S., Littlewort, G. C., Frank, M. G., Lainscsek, C., Fasel, I. R., & Movellan, J. R. (2006). Automatic recognition of facial actions in spontaneous expressions. Journal of Multimedia, 1(6), 22–35.CrossRef
Zurück zum Zitat Begum, N., & Keogh, E. (2014). Rare time series motif discovery from unbounded streams. VLDB, 8(2),149–160. Begum, N., & Keogh, E. (2014). Rare time series motif discovery from unbounded streams. VLDB, 8(2),149–160.
Zurück zum Zitat Boiman, O. & Irani, M. (2005). Detecting irregularities in images and in video. In ICCV. Boiman, O. & Irani, M. (2005). Detecting irregularities in images and in video. In ICCV.
Zurück zum Zitat Brand, M., Oliver, N. & Pentland, A. (1997). Coupled HMMs for complex action recognition. In CVPR. Brand, M., Oliver, N. & Pentland, A. (1997). Coupled HMMs for complex action recognition. In CVPR.
Zurück zum Zitat Brendel, W., & Todorovic, S. (2011). Learning spatiotemporal graphs of human activities. In Proceedings of ICCV (pp. 778–785). Brendel, W., & Todorovic, S. (2011). Learning spatiotemporal graphs of human activities. In Proceedings of ICCV (pp. 778–785).
Zurück zum Zitat Chaaraoui, A. A., Climent-Pérez, P., & Flórez-Revuelta, F. (2012). A review on vision techniques applied to human behaviour analysis for ambient-assisted living. Expert Systems with Applications, 39(12), 10873–10888.CrossRef Chaaraoui, A. A., Climent-Pérez, P., & Flórez-Revuelta, F. (2012). A review on vision techniques applied to human behaviour analysis for ambient-assisted living. Expert Systems with Applications, 39(12), 10873–10888.CrossRef
Zurück zum Zitat Chu, W. S., Chen, C. P., & Chen, C. S. (2010). Momi-cosegmentation: Simultaneous segmentation of multiple objects among multiple images. In Proceedings of ACCV. Chu, W. S., Chen, C. P., & Chen, C. S. (2010). Momi-cosegmentation: Simultaneous segmentation of multiple objects among multiple images. In Proceedings of ACCV.
Zurück zum Zitat Chu, W. S., De la Torre, F., & Cohn, J. F. (2016). Selective transfer machine for personalized facial expression analysis. TPAMI. Chu, W. S., De la Torre, F., & Cohn, J. F. (2016). Selective transfer machine for personalized facial expression analysis. TPAMI.
Zurück zum Zitat Chu, W. S., Zeng, J., De la Torre, F., Cohn, J. F., & Messinger, D. S. (2015). Unsupervised synchrony discovery in human interaction. In ICCV. Chu, W. S., Zeng, J., De la Torre, F., Cohn, J. F., & Messinger, D. S. (2015). Unsupervised synchrony discovery in human interaction. In ICCV.
Zurück zum Zitat Chu, W.S., Zhou, F., & De la Torre, F. (2012) Unsupervised temporal commonality discovery. In ECCV. Chu, W.S., Zhou, F., & De la Torre, F. (2012) Unsupervised temporal commonality discovery. In ECCV.
Zurück zum Zitat Cooper, H., & Bowden, R. (2009). Learning signs from subtitles: A weakly supervised approach to sign language recognition. In CVPR. Cooper, H., & Bowden, R. (2009). Learning signs from subtitles: A weakly supervised approach to sign language recognition. In CVPR.
Zurück zum Zitat De la Torre, F., Chu, W. S., Xiong, X., Ding, X., & Cohn, J. F. (2015). Intraface. In Automatic face and gesture recognition. De la Torre, F., Chu, W. S., Xiong, X., Ding, X., & Cohn, J. F. (2015). Intraface. In Automatic face and gesture recognition.
Zurück zum Zitat Delaherche, E., Chetouani, M., Mahdhaoui, A., Saint-Georges, C., Viaux, S., & Cohen, D. (2012). Interpersonal synchrony: A survey of evaluation methods across disciplines. IEEE Transactions on Affective Computing, 3(3), 349–365.CrossRef Delaherche, E., Chetouani, M., Mahdhaoui, A., Saint-Georges, C., Viaux, S., & Cohen, D. (2012). Interpersonal synchrony: A survey of evaluation methods across disciplines. IEEE Transactions on Affective Computing, 3(3), 349–365.CrossRef
Zurück zum Zitat Ding, X., Chu, W. S., De la Torre, F., Cohn, J. F., & Wang, Q. (2012). Facial action unit event detection by cascade of tasks. In ICCV (vol. 2013). Ding, X., Chu, W. S., De la Torre, F., Cohn, J. F., & Wang, Q. (2012). Facial action unit event detection by cascade of tasks. In ICCV (vol. 2013).
Zurück zum Zitat Du, S., Tao, Y., & Martinez, A. M. (2014). Compound facial expressions of emotion. Proceedings of the National Academy of Sciences, 111(15), E1454–E1462.CrossRef Du, S., Tao, Y., & Martinez, A. M. (2014). Compound facial expressions of emotion. Proceedings of the National Academy of Sciences, 111(15), E1454–E1462.CrossRef
Zurück zum Zitat Duchenne, O., Laptev, I., Sivic, J., Bach, F., & Ponce, J. (2009). Automatic annotation of human actions in video. In ICCV. Duchenne, O., Laptev, I., Sivic, J., Bach, F., & Ponce, J. (2009). Automatic annotation of human actions in video. In ICCV.
Zurück zum Zitat Everingham, M., Zisserman, A., Williams, C. I., & Van Gool, L. (2006). The PASCAL visual object classes challenge 2006 results. In 2th PASCAL challenge. Everingham, M., Zisserman, A., Williams, C. I., & Van Gool, L. (2006). The PASCAL visual object classes challenge 2006 results. In 2th PASCAL challenge.
Zurück zum Zitat Feris, R., Bobbitt, R., Brown, L., & Pankanti, S. (2014). Attribute-based people search: Lessons learnt from a practical surveillance system. In ICMR. Feris, R., Bobbitt, R., Brown, L., & Pankanti, S. (2014). Attribute-based people search: Lessons learnt from a practical surveillance system. In ICMR.
Zurück zum Zitat Gao, L., Song, J., Nie, F., Yan, Y., Sebe, N., & Tao Shen, H. (2015). Optimal graph learning with partial tags and multiple features for image and video annotation. In CVPR. Gao, L., Song, J., Nie, F., Yan, Y., Sebe, N., & Tao Shen, H. (2015). Optimal graph learning with partial tags and multiple features for image and video annotation. In CVPR.
Zurück zum Zitat Gendron, B., & Crainic, T. G. (1994). Parallel branch-and-branch algorithms: Survey and synthesis. Operations Research, 42(6), 1042–1066.MathSciNetCrossRefMATH Gendron, B., & Crainic, T. G. (1994). Parallel branch-and-branch algorithms: Survey and synthesis. Operations Research, 42(6), 1042–1066.MathSciNetCrossRefMATH
Zurück zum Zitat Girard, J. M., Cohn, J. F., Jeni, L. A., Lucey, S., & De la Torre, F. (2015). How much training data for facial action unit detection? In AFGR. Girard, J. M., Cohn, J. F., Jeni, L. A., Lucey, S., & De la Torre, F. (2015). How much training data for facial action unit detection? In AFGR.
Zurück zum Zitat Goldberger, J., Gordon, S., & Greenspan, H. (2003). An efficient image similarity measure based on approximations of kl-divergence between two gaussian mixtures. In ICCV. Goldberger, J., Gordon, S., & Greenspan, H. (2003). An efficient image similarity measure based on approximations of kl-divergence between two gaussian mixtures. In ICCV.
Zurück zum Zitat Gusfield, D. (1997). Algorithms on strings, trees, and sequences: Computer science and computational biology. Cambridge: Cambridge University Press.CrossRefMATH Gusfield, D. (1997). Algorithms on strings, trees, and sequences: Computer science and computational biology. Cambridge: Cambridge University Press.CrossRefMATH
Zurück zum Zitat Han, D., Bo, L., & Sminchisescu, C. (2009). Selection and context for action recognition. In ICCV (2009) Han, D., Bo, L., & Sminchisescu, C. (2009). Selection and context for action recognition. In ICCV (2009)
Zurück zum Zitat Hoai, M., Zhong Lan, Z., & De la Torre, F. (2011). Joint segmentation and classification of human actions in video. In CVPR. Hoai, M., Zhong Lan, Z., & De la Torre, F. (2011). Joint segmentation and classification of human actions in video. In CVPR.
Zurück zum Zitat Hongeng, S., & Nevatia, R. (2001). Multi-agent event recognition. In ICCV. Hongeng, S., & Nevatia, R. (2001). Multi-agent event recognition. In ICCV.
Zurück zum Zitat Hu, W., Xie, N., Li, L., Zeng, X., & Maybank, S. (2011). A survey on visual content-based video indexing and retrieval. IEEE Transactions on Systems, Man, and Cybernetics, Part C, 41(6), 797–819.CrossRef Hu, W., Xie, N., Li, L., Zeng, X., & Maybank, S. (2011). A survey on visual content-based video indexing and retrieval. IEEE Transactions on Systems, Man, and Cybernetics, Part C, 41(6), 797–819.CrossRef
Zurück zum Zitat Jhuang, H., Serre, T., Wolf, L., & Poggio, T. (2007). A biologically inspired system for action recognition. In ICCV. Jhuang, H., Serre, T., Wolf, L., & Poggio, T. (2007). A biologically inspired system for action recognition. In ICCV.
Zurück zum Zitat Keogh, E., & Ratanamahatana, C. A. (2005). Exact indexing of dynamic time warping. Knowledge and Information Systems, 7(3), 358–386.CrossRef Keogh, E., & Ratanamahatana, C. A. (2005). Exact indexing of dynamic time warping. Knowledge and Information Systems, 7(3), 358–386.CrossRef
Zurück zum Zitat Krüger, S. E., Schafföner, M., Katz, M., Andelic, E., & Wendemuth, A. (2005). Speech recognition with support vector machines in a hybrid system. In Interspeech. Krüger, S. E., Schafföner, M., Katz, M., Andelic, E., & Wendemuth, A. (2005). Speech recognition with support vector machines in a hybrid system. In Interspeech.
Zurück zum Zitat Lampert, C., Blaschko, M., & Hofmann, T. (2009). Efficient subwindow search: A branch and bound framework for object localization. IEEE TPAMI, 31(12), 2129–2142.CrossRef Lampert, C., Blaschko, M., & Hofmann, T. (2009). Efficient subwindow search: A branch and bound framework for object localization. IEEE TPAMI, 31(12), 2129–2142.CrossRef
Zurück zum Zitat Laptev, I., Marszalek, M., Schmid, C., & Rozenfeld, B. (2008). Learning realistic human actions from movies. In Proceedings of CVPR. Laptev, I., Marszalek, M., Schmid, C., & Rozenfeld, B. (2008). Learning realistic human actions from movies. In Proceedings of CVPR.
Zurück zum Zitat Lehmann, A., Leibe, B., & Van Gool, L. (2011). Fast prism: Branch and bound hough transform for object class detection. IJCV, 94(2), 175–197.CrossRefMATH Lehmann, A., Leibe, B., & Van Gool, L. (2011). Fast prism: Branch and bound hough transform for object class detection. IJCV, 94(2), 175–197.CrossRefMATH
Zurück zum Zitat Littlewort, G., Bartlett, M. S., Fasel, I., Susskind, J., & Movellan, J. (2006). Dynamics of facial expression extracted automatically from video. Image and Vision Computing, 24(6), 615–625.CrossRef Littlewort, G., Bartlett, M. S., Fasel, I., Susskind, J., & Movellan, J. (2006). Dynamics of facial expression extracted automatically from video. Image and Vision Computing, 24(6), 615–625.CrossRef
Zurück zum Zitat Liu, C. D., Chung, Y. N., & Chung, P. C. (2010). An interaction-embedded hmm framework for human behavior understanding: With nursing environments as examples. IEEE Transactions on Information Technology in Biomedicine, 14(5), 1236–1246.MathSciNetCrossRef Liu, C. D., Chung, Y. N., & Chung, P. C. (2010). An interaction-embedded hmm framework for human behavior understanding: With nursing environments as examples. IEEE Transactions on Information Technology in Biomedicine, 14(5), 1236–1246.MathSciNetCrossRef
Zurück zum Zitat Liu, H., & Yan, S. (2010). Common visual pattern discovery via spatially coherent correspondences. In Proceedings of CVPR. Liu, H., & Yan, S. (2010). Common visual pattern discovery via spatially coherent correspondences. In Proceedings of CVPR.
Zurück zum Zitat Liu, J., Shah, M., Kuipers, B., & Savarese, S. (2011). Cross-view action recognition via view knowledge transfer. In: CVPR. Liu, J., Shah, M., Kuipers, B., & Savarese, S. (2011). Cross-view action recognition via view knowledge transfer. In: CVPR.
Zurück zum Zitat Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010). The extended cohn-kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In: CVPRW. Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010). The extended cohn-kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In: CVPRW.
Zurück zum Zitat Matthews, I., & Baker, S. (2004). Active appearance models revisited. IJCV, 60(2), 135–164.CrossRef Matthews, I., & Baker, S. (2004). Active appearance models revisited. IJCV, 60(2), 135–164.CrossRef
Zurück zum Zitat Messinger, D. M., Ruvolo, P., Ekas, N. V., & Fogel, A. (2010). Applying machine learning to infant interaction: The development is in the details. Neural Networks, 23(8), 1004–1016.CrossRef Messinger, D. M., Ruvolo, P., Ekas, N. V., & Fogel, A. (2010). Applying machine learning to infant interaction: The development is in the details. Neural Networks, 23(8), 1004–1016.CrossRef
Zurück zum Zitat Messinger, D. S., Mahoor, M. H., Chow, S. M., & Cohn, J. F. (2009). Automated measurement of facial expression in infant-mother interaction: A pilot study. Infancy, 14(3), 285–305.CrossRef Messinger, D. S., Mahoor, M. H., Chow, S. M., & Cohn, J. F. (2009). Automated measurement of facial expression in infant-mother interaction: A pilot study. Infancy, 14(3), 285–305.CrossRef
Zurück zum Zitat Minnen, D., Isbell, C., Essa, I., & Starner, T. (2007). Discovering multivariate motifs using subsequence density estimation. In: AAAI. Minnen, D., Isbell, C., Essa, I., & Starner, T. (2007). Discovering multivariate motifs using subsequence density estimation. In: AAAI.
Zurück zum Zitat Mueen, A., & Keogh, E. (2010). Online discovery and maintenance of time series motifs. In: KDD. Mueen, A., & Keogh, E. (2010). Online discovery and maintenance of time series motifs. In: KDD.
Zurück zum Zitat Mukherjee, L., Singh, V., & Peng, J. (2011). Scale invariant cosegmentation for image groups. In Proceedings of CVPR. Mukherjee, L., Singh, V., & Peng, J. (2011). Scale invariant cosegmentation for image groups. In Proceedings of CVPR.
Zurück zum Zitat Murphy, K. P. (2012). Machine learning: A probabilistic perspective. Cambridge: MIT press.MATH Murphy, K. P. (2012). Machine learning: A probabilistic perspective. Cambridge: MIT press.MATH
Zurück zum Zitat Narendra, P. M., & Fukunaga, K. (1977). A branch and bound algorithm for feature subset selection. IEEE Transactions on Computers, 100(9), 917–922.CrossRefMATH Narendra, P. M., & Fukunaga, K. (1977). A branch and bound algorithm for feature subset selection. IEEE Transactions on Computers, 100(9), 917–922.CrossRefMATH
Zurück zum Zitat Nayak, S., Duncan, K., Sarkar, S., & Loeding, B. (2012). Finding recurrent patterns from continuous sign language sentences for automated extraction of signs. Journal of Machine Learning Research, 13(1), 2589–2615.MathSciNetMATH Nayak, S., Duncan, K., Sarkar, S., & Loeding, B. (2012). Finding recurrent patterns from continuous sign language sentences for automated extraction of signs. Journal of Machine Learning Research, 13(1), 2589–2615.MathSciNetMATH
Zurück zum Zitat Oliver, N. M., Rosario, B., & Pentland, A. P. (2000). A bayesian computer vision system for modeling human interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 831–843.CrossRef Oliver, N. M., Rosario, B., & Pentland, A. P. (2000). A bayesian computer vision system for modeling human interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 831–843.CrossRef
Zurück zum Zitat Paterson, M., & Dančík, V. (1994). Longest common subsequences. Mathematical Foundations of Computer Science, 1994(841), 127–142.MathSciNetMATH Paterson, M., & Dančík, V. (1994). Longest common subsequences. Mathematical Foundations of Computer Science, 1994(841), 127–142.MathSciNetMATH
Zurück zum Zitat Platt, J., et al. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers, 10(3), 61–74. Platt, J., et al. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers, 10(3), 61–74.
Zurück zum Zitat Reddy, K. K., & Shah, M. (2013). Recognizing 50 human action categories of web videos. Machine Vision and Applications, 24(5), 971–981.CrossRef Reddy, K. K., & Shah, M. (2013). Recognizing 50 human action categories of web videos. Machine Vision and Applications, 24(5), 971–981.CrossRef
Zurück zum Zitat Rubner, Y., Tomasi, C., & Guibas, L. J. (2000). The earth mover’s distance as a metric for image retrieval. IJCV, 40(2), 99–121.CrossRefMATH Rubner, Y., Tomasi, C., & Guibas, L. J. (2000). The earth mover’s distance as a metric for image retrieval. IJCV, 40(2), 99–121.CrossRefMATH
Zurück zum Zitat Sadanand, S., & Corso, J. J. (2012). Action bank: A high-level representation of activity in video. In CVPR. Sadanand, S., & Corso, J. J. (2012). Action bank: A high-level representation of activity in video. In CVPR.
Zurück zum Zitat Sangineto, E., Zen, G., Ricci, E. & Sebe, N. (2014). We are not all equal: Personalizing models for facial expression analysis with transductive parameter transfer. In Proceedings of ACMMM. Sangineto, E., Zen, G., Ricci, E. & Sebe, N. (2014). We are not all equal: Personalizing models for facial expression analysis with transductive parameter transfer. In Proceedings of ACMMM.
Zurück zum Zitat Sayette, M. A., Creswell, K. G., Dimoff, J. D., Fairbairn, C. E., Cohn, J. F., Heckman, B. W., et al. (2012). Alcohol and group formation a multimodal investigation of the effects of alcohol on emotion and social bonding. Psychological Science, 23, 869–878.CrossRef Sayette, M. A., Creswell, K. G., Dimoff, J. D., Fairbairn, C. E., Cohn, J. F., Heckman, B. W., et al. (2012). Alcohol and group formation a multimodal investigation of the effects of alcohol on emotion and social bonding. Psychological Science, 23, 869–878.CrossRef
Zurück zum Zitat Schindler, G., Krishnamurthy, P., Lublinerman, R., Liu, Y., & Dellaert, F. (2008). Detecting and matching repeated patterns for automatic geo-tagging in urban environments. In Proceedings of CVPR. Schindler, G., Krishnamurthy, P., Lublinerman, R., Liu, Y., & Dellaert, F. (2008). Detecting and matching repeated patterns for automatic geo-tagging in urban environments. In Proceedings of CVPR.
Zurück zum Zitat Schmidt, R. C., Morr, S., Fitzpatrick, P., & Richardson, M. J. (2012). Measuring the dynamics of interactional synchrony. Journal of Nonverbal Behavior, 36(4), 263–279.CrossRef Schmidt, R. C., Morr, S., Fitzpatrick, P., & Richardson, M. J. (2012). Measuring the dynamics of interactional synchrony. Journal of Nonverbal Behavior, 36(4), 263–279.CrossRef
Zurück zum Zitat Scholkopf, B. (2001). The kernel trick for distances. In NIPS. Scholkopf, B. (2001). The kernel trick for distances. In NIPS.
Zurück zum Zitat Schuller, B., & Rigoll, G. (2006). Timing levels in segment-based speech emotion recognition. In Interspeech. Schuller, B., & Rigoll, G. (2006). Timing levels in segment-based speech emotion recognition. In Interspeech.
Zurück zum Zitat Si, Z., Pei, M., Yao, B., & Zhu, S. (2011). Unsupervised learning of event and-or grammar and semantics from video. In ICCV. Si, Z., Pei, M., Yao, B., & Zhu, S. (2011). Unsupervised learning of event and-or grammar and semantics from video. In ICCV.
Zurück zum Zitat Sivic, J., & Zisserman, A. (2003). Video google: A text retrieval approach to object matching in videos. In Proceedings of ICCV. Sivic, J., & Zisserman, A. (2003). Video google: A text retrieval approach to object matching in videos. In Proceedings of ICCV.
Zurück zum Zitat Sun, M., Telaprolu, M., Lee, H., & Savarese, S. (2012). An efficient branch-and-bound algorithm for optimal human pose estimation. In CVPR. Sun, M., Telaprolu, M., Lee, H., & Savarese, S. (2012). An efficient branch-and-bound algorithm for optimal human pose estimation. In CVPR.
Zurück zum Zitat Turaga, P., Veeraraghavan, A., & Chellappa, R. (2009). Unsupervised view and rate invariant clustering of video sequences. CVIU, 113(3), 353–371. Turaga, P., Veeraraghavan, A., & Chellappa, R. (2009). Unsupervised view and rate invariant clustering of video sequences. CVIU, 113(3), 353–371.
Zurück zum Zitat Valstar, M., & Pantic, M. (2006). Fully automatic facial action unit detection and temporal analysis. In CVPRW. Valstar, M., & Pantic, M. (2006). Fully automatic facial action unit detection and temporal analysis. In CVPRW.
Zurück zum Zitat Viola, P., & Jones, M. J. (2004). Robust real-time face detection. IJCV, 57(2), 137–154.CrossRef Viola, P., & Jones, M. J. (2004). Robust real-time face detection. IJCV, 57(2), 137–154.CrossRef
Zurück zum Zitat Wang, H., Zhao, G., & Yuan, J. (2014). Visual pattern discovery in image and video data: A brief survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4(1), 24–37. Wang, H., Zhao, G., & Yuan, J. (2014). Visual pattern discovery in image and video data: A brief survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4(1), 24–37.
Zurück zum Zitat Wang, Y., Jiang, H., Drew, M. S., Li, Z., & Mori, G. (2006). Unsupervised discovery of action classes. In Proceedings of CVPR. Wang, Y., Jiang, H., Drew, M. S., Li, Z., & Mori, G. (2006). Unsupervised discovery of action classes. In Proceedings of CVPR.
Zurück zum Zitat Wang, Y., & Velipasalar, S. (2009). Frame-level temporal calibration of unsynchronized cameras by using Longest Consecutive Common Subsequence. In ICASSP. Wang, Y., & Velipasalar, S. (2009). Frame-level temporal calibration of unsynchronized cameras by using Longest Consecutive Common Subsequence. In ICASSP.
Zurück zum Zitat Yang, Y., Saleemi, I., & Shah, M. (2013a). Discovering motion primitives for unsupervised grouping and one-shot learning of human actions, gestures, and expressions. TPAMI, 35(7), 1635–1648. Yang, Y., Saleemi, I., & Shah, M. (2013a). Discovering motion primitives for unsupervised grouping and one-shot learning of human actions, gestures, and expressions. TPAMI, 35(7), 1635–1648.
Zurück zum Zitat Yang, Y., Song, J., Huang, Z., Ma, Z., Sebe, N., & Hauptmann, A. G. (2013b). Multi-feature fusion via hierarchical regression for multimedia analysis. IEEE Transactions on Multimedia, 15, 572–581. Yang, Y., Song, J., Huang, Z., Ma, Z., Sebe, N., & Hauptmann, A. G. (2013b). Multi-feature fusion via hierarchical regression for multimedia analysis. IEEE Transactions on Multimedia, 15, 572–581.
Zurück zum Zitat Yu, X., Zhang, S., Yu, Y., Dunbar, N., Jensen, M., Burgoon, J. K., & Metaxas, D. N. (2013). Automated analysis of interactional synchrony using robust facial tracking and expression recognition. In Automatic Face and Gesture Recognition. Yu, X., Zhang, S., Yu, Y., Dunbar, N., Jensen, M., Burgoon, J. K., & Metaxas, D. N. (2013). Automated analysis of interactional synchrony using robust facial tracking and expression recognition. In Automatic Face and Gesture Recognition.
Zurück zum Zitat Yuan, J., Liu, Z., & Wu, Y. (2011). Discriminative video pattern search for efficient action detection. IEEE TPAMI, 33(9), 1728–1743.CrossRef Yuan, J., Liu, Z., & Wu, Y. (2011). Discriminative video pattern search for efficient action detection. IEEE TPAMI, 33(9), 1728–1743.CrossRef
Zurück zum Zitat Zheng, Y., Gu, S., & Tomasi, C. (2011). Detecting motion synchrony by video tubes. In ACMMM. Zheng, Y., Gu, S., & Tomasi, C. (2011). Detecting motion synchrony by video tubes. In ACMMM.
Zurück zum Zitat Zhou, F., De la Torre, F., & Hodgins, J. K. (2013). Hierarchical aligned cluster analysis for temporal clustering of human motion. IEEE TPAMI, 35(3), 582–596. Zhou, F., De la Torre, F., & Hodgins, J. K. (2013). Hierarchical aligned cluster analysis for temporal clustering of human motion. IEEE TPAMI, 35(3), 582–596.
Zurück zum Zitat Zhou, F., De la Torre, F., & Cohn, J. F. (2010). Unsupervised discovery of facial events. In Proceedings of CVPR. Zhou, F., De la Torre, F., & Cohn, J. F. (2010). Unsupervised discovery of facial events. In Proceedings of CVPR.
Zurück zum Zitat Zhu, S., & Mumford, D. (2006). A stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision, 2(4), 259–362.CrossRefMATH Zhu, S., & Mumford, D. (2006). A stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision, 2(4), 259–362.CrossRefMATH
Metadaten
Titel
A Branch-and-Bound Framework for Unsupervised Common Event Discovery
verfasst von
Wen-Sheng Chu
Fernando De la Torre
Jeffrey F. Cohn
Daniel S. Messinger
Publikationsdatum
09.02.2017
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 3/2017
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-017-0989-7

Weitere Artikel der Ausgabe 3/2017

International Journal of Computer Vision 3/2017 Zur Ausgabe