nach oben

International Journal of Computer Vision

Erschienen in:

09.02.2017

A Branch-and-Bound Framework for Unsupervised Common Event Discovery

verfasst von: Wen-Sheng Chu, Fernando De la Torre, Jeffrey F. Cohn, Daniel S. Messinger

Erschienen in: International Journal of Computer Vision | Ausgabe 3/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Event discovery aims to discover a temporal segment of interest, such as human behavior, actions or activities. Most approaches to event discovery within or between time series use supervised learning. This becomes problematic when relevant event labels are unknown, are difficult to detect, or not all possible combinations of events have been anticipated. To overcome these problems, this paper explores Common Event Discovery (CED), a new problem that aims to discover common events of variable-length segments in an unsupervised manner. A potential solution to CED is searching over all possible pairs of segments, which would incur a prohibitive quartic cost. In this paper, we propose an efficient branch-and-bound (B&B) framework that avoids exhaustive search while guaranteeing a globally optimal solution. To this end, we derive novel bounding functions for various commonality measures and provide extensions to multiple commonality discovery and accelerated search. The B&B framework takes as input any multidimensional signal that can be quantified into histograms. A generalization of the framework can be readily applied to discover events at the same or different times (synchrony and event commonality, respectively). We consider extensions to video search and supervised event detection. The effectiveness of the B&B framework is evaluated in motion capture of deliberate behavior and in video of spontaneous facial behavior in diverse interpersonal contexts: interviews, small groups of young adults, and parent-infant face-to-face interaction.

Vorheriger Artikel Max-Margin Heterogeneous Information Machine for RGB-D Action Recognition

Nächster Artikel Exemplar-Guided Similarity Learning on Polynomial Kernel Feature Map for Person Re-identification

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

Bold capital letters denote a matrix \(\mathbf {X}\), bold lower-case letters a column vector \(\mathbf {x}\). \(\mathbf {x}_i\) represents the ith column of the matrix \(\mathbf {X}\). \(x_{ij}\) denotes the scalar in the ith row and jth column of the matrix \(\mathbf {X}\). All non-bold letters represent scalars.

Amberg, B., & Vetter, T. (2011). Optimal landmark detection using shape models and branch and bound. In ICCV.

Balakrishnan, V., Boyd, S., & Balemi, S. (1991). Branch and bound algorithm for computing the minimum stability degree of parameter-dependent linear systems. International Journal of Robust and Nonlinear Control, 1(4), 295–317.CrossRefMATH

Barbič, J., Safonova, A., Pan, J. Y., Faloutsos, C., Hodgins, J. K. & Pollard, N. S. (2004). Segmenting motion capture data into distinct behaviors. In Proceedings of Graphics Interface 2004 (pp. 185–194). Canadian Human-Computer Communications Society.

Bartlett, M. S., Littlewort, G. C., Frank, M. G., Lainscsek, C., Fasel, I. R., & Movellan, J. R. (2006). Automatic recognition of facial actions in spontaneous expressions. Journal of Multimedia, 1(6), 22–35.CrossRef

Begum, N., & Keogh, E. (2014). Rare time series motif discovery from unbounded streams. VLDB, 8(2),149–160.

Boiman, O. & Irani, M. (2005). Detecting irregularities in images and in video. In ICCV.

Brand, M., Oliver, N. & Pentland, A. (1997). Coupled HMMs for complex action recognition. In CVPR.

Brendel, W., & Todorovic, S. (2011). Learning spatiotemporal graphs of human activities. In Proceedings of ICCV (pp. 778–785).

Chaaraoui, A. A., Climent-Pérez, P., & Flórez-Revuelta, F. (2012). A review on vision techniques applied to human behaviour analysis for ambient-assisted living. Expert Systems with Applications, 39(12), 10873–10888.CrossRef

Chu, W. S., Chen, C. P., & Chen, C. S. (2010). Momi-cosegmentation: Simultaneous segmentation of multiple objects among multiple images. In Proceedings of ACCV.

Chu, W. S., De la Torre, F., & Cohn, J. F. (2016). Selective transfer machine for personalized facial expression analysis. TPAMI.

Chu, W. S., Zeng, J., De la Torre, F., Cohn, J. F., & Messinger, D. S. (2015). Unsupervised synchrony discovery in human interaction. In ICCV.

Chu, W.S., Zhou, F., & De la Torre, F. (2012) Unsupervised temporal commonality discovery. In ECCV.

Cooper, H., & Bowden, R. (2009). Learning signs from subtitles: A weakly supervised approach to sign language recognition. In CVPR.

De la Torre, F., Chu, W. S., Xiong, X., Ding, X., & Cohn, J. F. (2015). Intraface. In Automatic face and gesture recognition.

Delaherche, E., Chetouani, M., Mahdhaoui, A., Saint-Georges, C., Viaux, S., & Cohen, D. (2012). Interpersonal synchrony: A survey of evaluation methods across disciplines. IEEE Transactions on Affective Computing, 3(3), 349–365.CrossRef

Ding, X., Chu, W. S., De la Torre, F., Cohn, J. F., & Wang, Q. (2012). Facial action unit event detection by cascade of tasks. In ICCV (vol. 2013).

Du, S., Tao, Y., & Martinez, A. M. (2014). Compound facial expressions of emotion. Proceedings of the National Academy of Sciences, 111(15), E1454–E1462.CrossRef

Duchenne, O., Laptev, I., Sivic, J., Bach, F., & Ponce, J. (2009). Automatic annotation of human actions in video. In ICCV.

Everingham, M., Zisserman, A., Williams, C. I., & Van Gool, L. (2006). The PASCAL visual object classes challenge 2006 results. In 2th PASCAL challenge.

Feris, R., Bobbitt, R., Brown, L., & Pankanti, S. (2014). Attribute-based people search: Lessons learnt from a practical surveillance system. In ICMR.

Gao, L., Song, J., Nie, F., Yan, Y., Sebe, N., & Tao Shen, H. (2015). Optimal graph learning with partial tags and multiple features for image and video annotation. In CVPR.

Gendron, B., & Crainic, T. G. (1994). Parallel branch-and-branch algorithms: Survey and synthesis. Operations Research, 42(6), 1042–1066.MathSciNetCrossRefMATH

Girard, J. M., Cohn, J. F., Jeni, L. A., Lucey, S., & De la Torre, F. (2015). How much training data for facial action unit detection? In AFGR.

Goldberger, J., Gordon, S., & Greenspan, H. (2003). An efficient image similarity measure based on approximations of kl-divergence between two gaussian mixtures. In ICCV.

Gusfield, D. (1997). Algorithms on strings, trees, and sequences: Computer science and computational biology. Cambridge: Cambridge University Press.CrossRefMATH

Han, D., Bo, L., & Sminchisescu, C. (2009). Selection and context for action recognition. In ICCV (2009)

Hoai, M., Zhong Lan, Z., & De la Torre, F. (2011). Joint segmentation and classification of human actions in video. In CVPR.

Hongeng, S., & Nevatia, R. (2001). Multi-agent event recognition. In ICCV.

Hu, W., Xie, N., Li, L., Zeng, X., & Maybank, S. (2011). A survey on visual content-based video indexing and retrieval. IEEE Transactions on Systems, Man, and Cybernetics, Part C, 41(6), 797–819.CrossRef

Jhuang, H., Serre, T., Wolf, L., & Poggio, T. (2007). A biologically inspired system for action recognition. In ICCV.

Keogh, E., & Ratanamahatana, C. A. (2005). Exact indexing of dynamic time warping. Knowledge and Information Systems, 7(3), 358–386.CrossRef

Krüger, S. E., Schafföner, M., Katz, M., Andelic, E., & Wendemuth, A. (2005). Speech recognition with support vector machines in a hybrid system. In Interspeech.

Lampert, C., Blaschko, M., & Hofmann, T. (2009). Efficient subwindow search: A branch and bound framework for object localization. IEEE TPAMI, 31(12), 2129–2142.CrossRef

Laptev, I., Marszalek, M., Schmid, C., & Rozenfeld, B. (2008). Learning realistic human actions from movies. In Proceedings of CVPR.

Lehmann, A., Leibe, B., & Van Gool, L. (2011). Fast prism: Branch and bound hough transform for object class detection. IJCV, 94(2), 175–197.CrossRefMATH

Littlewort, G., Bartlett, M. S., Fasel, I., Susskind, J., & Movellan, J. (2006). Dynamics of facial expression extracted automatically from video. Image and Vision Computing, 24(6), 615–625.CrossRef

Liu, C. D., Chung, Y. N., & Chung, P. C. (2010). An interaction-embedded hmm framework for human behavior understanding: With nursing environments as examples. IEEE Transactions on Information Technology in Biomedicine, 14(5), 1236–1246.MathSciNetCrossRef

Liu, H., & Yan, S. (2010). Common visual pattern discovery via spatially coherent correspondences. In Proceedings of CVPR.

Liu, J., Shah, M., Kuipers, B., & Savarese, S. (2011). Cross-view action recognition via view knowledge transfer. In: CVPR.

Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010). The extended cohn-kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In: CVPRW.

Maier, D. (1978). The complexity of some problems on subsequences and supersequences. Journal of the ACM, 25(2), 322–336.MathSciNetCrossRefMATH

Matthews, I., & Baker, S. (2004). Active appearance models revisited. IJCV, 60(2), 135–164.CrossRef

Messinger, D. M., Ruvolo, P., Ekas, N. V., & Fogel, A. (2010). Applying machine learning to infant interaction: The development is in the details. Neural Networks, 23(8), 1004–1016.CrossRef

Messinger, D. S., Mahoor, M. H., Chow, S. M., & Cohn, J. F. (2009). Automated measurement of facial expression in infant-mother interaction: A pilot study. Infancy, 14(3), 285–305.CrossRef

Minnen, D., Isbell, C., Essa, I., & Starner, T. (2007). Discovering multivariate motifs using subsequence density estimation. In: AAAI.

Mueen, A., & Keogh, E. (2010). Online discovery and maintenance of time series motifs. In: KDD.

Mukherjee, L., Singh, V., & Peng, J. (2011). Scale invariant cosegmentation for image groups. In Proceedings of CVPR.

Murphy, K. P. (2012). Machine learning: A probabilistic perspective. Cambridge: MIT press.MATH

Narendra, P. M., & Fukunaga, K. (1977). A branch and bound algorithm for feature subset selection. IEEE Transactions on Computers, 100(9), 917–922.CrossRefMATH

Nayak, S., Duncan, K., Sarkar, S., & Loeding, B. (2012). Finding recurrent patterns from continuous sign language sentences for automated extraction of signs. Journal of Machine Learning Research, 13(1), 2589–2615.MathSciNetMATH

Oliver, N. M., Rosario, B., & Pentland, A. P. (2000). A bayesian computer vision system for modeling human interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 831–843.CrossRef

Paterson, M., & Dančík, V. (1994). Longest common subsequences. Mathematical Foundations of Computer Science, 1994(841), 127–142.MathSciNetMATH

Platt, J., et al. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers, 10(3), 61–74.

Reddy, K. K., & Shah, M. (2013). Recognizing 50 human action categories of web videos. Machine Vision and Applications, 24(5), 971–981.CrossRef

Rubner, Y., Tomasi, C., & Guibas, L. J. (2000). The earth mover’s distance as a metric for image retrieval. IJCV, 40(2), 99–121.CrossRefMATH

Sadanand, S., & Corso, J. J. (2012). Action bank: A high-level representation of activity in video. In CVPR.

Sangineto, E., Zen, G., Ricci, E. & Sebe, N. (2014). We are not all equal: Personalizing models for facial expression analysis with transductive parameter transfer. In Proceedings of ACMMM.

Sayette, M. A., Creswell, K. G., Dimoff, J. D., Fairbairn, C. E., Cohn, J. F., Heckman, B. W., et al. (2012). Alcohol and group formation a multimodal investigation of the effects of alcohol on emotion and social bonding. Psychological Science, 23, 869–878.CrossRef

Schindler, G., Krishnamurthy, P., Lublinerman, R., Liu, Y., & Dellaert, F. (2008). Detecting and matching repeated patterns for automatic geo-tagging in urban environments. In Proceedings of CVPR.

Schmidt, R. C., Morr, S., Fitzpatrick, P., & Richardson, M. J. (2012). Measuring the dynamics of interactional synchrony. Journal of Nonverbal Behavior, 36(4), 263–279.CrossRef

Scholkopf, B. (2001). The kernel trick for distances. In NIPS.

Schuller, B., & Rigoll, G. (2006). Timing levels in segment-based speech emotion recognition. In Interspeech.

Si, Z., Pei, M., Yao, B., & Zhu, S. (2011). Unsupervised learning of event and-or grammar and semantics from video. In ICCV.

Sivic, J., & Zisserman, A. (2003). Video google: A text retrieval approach to object matching in videos. In Proceedings of ICCV.

Sun, M., Telaprolu, M., Lee, H., & Savarese, S. (2012). An efficient branch-and-bound algorithm for optimal human pose estimation. In CVPR.

Turaga, P., Veeraraghavan, A., & Chellappa, R. (2009). Unsupervised view and rate invariant clustering of video sequences. CVIU, 113(3), 353–371.

Valstar, M., & Pantic, M. (2006). Fully automatic facial action unit detection and temporal analysis. In CVPRW.

Viola, P., & Jones, M. J. (2004). Robust real-time face detection. IJCV, 57(2), 137–154.CrossRef

Wang, H., Zhao, G., & Yuan, J. (2014). Visual pattern discovery in image and video data: A brief survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4(1), 24–37.

Wang, Y., Jiang, H., Drew, M. S., Li, Z., & Mori, G. (2006). Unsupervised discovery of action classes. In Proceedings of CVPR.

Wang, Y., & Velipasalar, S. (2009). Frame-level temporal calibration of unsynchronized cameras by using Longest Consecutive Common Subsequence. In ICASSP.

Yang, Y., Saleemi, I., & Shah, M. (2013a). Discovering motion primitives for unsupervised grouping and one-shot learning of human actions, gestures, and expressions. TPAMI, 35(7), 1635–1648.

Yang, Y., Song, J., Huang, Z., Ma, Z., Sebe, N., & Hauptmann, A. G. (2013b). Multi-feature fusion via hierarchical regression for multimedia analysis. IEEE Transactions on Multimedia, 15, 572–581.

Yu, X., Zhang, S., Yu, Y., Dunbar, N., Jensen, M., Burgoon, J. K., & Metaxas, D. N. (2013). Automated analysis of interactional synchrony using robust facial tracking and expression recognition. In Automatic Face and Gesture Recognition.

Yuan, J., Liu, Z., & Wu, Y. (2011). Discriminative video pattern search for efficient action detection. IEEE TPAMI, 33(9), 1728–1743.CrossRef

Zheng, Y., Gu, S., & Tomasi, C. (2011). Detecting motion synchrony by video tubes. In ACMMM.

Zhou, F., De la Torre, F., & Hodgins, J. K. (2013). Hierarchical aligned cluster analysis for temporal clustering of human motion. IEEE TPAMI, 35(3), 582–596.

Zhou, F., De la Torre, F., & Cohn, J. F. (2010). Unsupervised discovery of facial events. In Proceedings of CVPR.

Zhu, S., & Mumford, D. (2006). A stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision, 2(4), 259–362.CrossRefMATH

Titel: A Branch-and-Bound Framework for Unsupervised Common Event Discovery
verfasst von: Wen-Sheng Chu
Fernando De la Torre
Jeffrey F. Cohn
Daniel S. Messinger
Publikationsdatum: 09.02.2017
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 3/2017
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-017-0989-7

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 3/2017

Max-Margin Heterogeneous Information Machine for RGB-D Action Recognition

Lie-X: Depth Image Based Articulated Object Pose Estimation, Tracking, and Action Recognition on Lie Groups

Exemplar-Guided Similarity Learning on Polynomial Kernel Feature Map for Person Re-identification

Empowering Simple Binary Classifiers for Image Set Based Face Recognition

Directed Acyclic Graph Continuous Max-Flow Image Segmentation for Unconstrained Label Orderings

Iterative Multiplicative Filters for Data Labeling