Skip to main content
Erschienen in: International Journal of Multimedia Information Retrieval 1/2015

01.03.2015 | Regular Paper

Weakly supervised detection of video events using hidden conditional random fields

verfasst von: Kimiaki Shirahama, Marcin Grzegorzek, Kuniaki Uehara

Erschienen in: International Journal of Multimedia Information Retrieval | Ausgabe 1/2015

Einloggen

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Multimedia Event Detection (MED) is the task to identify videos in which a certain event occurs. This paper addresses two problems in MED: weakly supervised setting and unclear event structure. The first indicates that since associations of shots with the event are laborious and incur annotator’s subjectivity, training videos are loosely annotated as to whether the event is contained or not. It is unknown which shots are relevant or irrelevant to the event. The second problem is the difficulty of assuming the event structure in advance, due to arbitrary camera and editing techniques. To tackle these problems, we propose a method using a Hidden Conditional Random Field (HCRF) which is a probabilistic discriminative classifier with a set of hidden states. We consider that the weakly supervised setting can be handled using hidden states as the intermediate layer to discriminate between relevant and irrelevant shots to the event. In addition, an unclear structure of the event can be exposed by features of each hidden state and its relation to the other states. Based on the above idea, we optimise hidden states and their relation so as to distinguish training videos containing the event from the others. Also, to exploit the full potential of HCRFs, we establish approaches for training video preparation, parameter initialisation and fusion of multiple HCRFs. Experimental results on TRECVID video data validate the effectiveness of our method.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
It is not reasonable to initialise \(\varvec{\theta }_\mathrm{weight}(h_{i})\) as the centre of the \(i\)th cluster because of the difference of value ranges. While \(\varvec{\theta }_\mathrm{weight}(h_{i})\) takes both positive and negative values, the cluster centre does not take negative ones because concept detection scores lie between \(0\) and \(1\).
 
2
We also tested PCA to make each dimension (concept) independent of each other, and the normalisation to obtain uniformed dimensions with the mean zero and the variance one. However, neither of them worked well. It can be considered that detection scores for each concept are appropriately biased by the detector, so editing their distribution does not offer improvement.
 
Literatur
2.
Zurück zum Zitat Ando R, Shinoda K, Furui S, Mochizuki T (2006) Robust scene recognition using language models for scene contexts. In: Proceedings of MIR 2006, pp 99–106 Ando R, Shinoda K, Furui S, Mochizuki T (2006) Robust scene recognition using language models for scene contexts. In: Proceedings of MIR 2006, pp 99–106
3.
Zurück zum Zitat Arijon, D (1976) Grammar of the film language. Silman-James Press, Los Angeles Arijon, D (1976) Grammar of the film language. Silman-James Press, Los Angeles
4.
Zurück zum Zitat Ayache S, Quénot G (2008) Video corpus annotation using active learning. In: Proceedings of ECIR 2008, pp 187–198 Ayache S, Quénot G (2008) Video corpus annotation using active learning. In: Proceedings of ECIR 2008, pp 187–198
5.
Zurück zum Zitat Barnard M, Odobez J (2005) Sports event recognition using layered HMMs. In: Proceedings of ICME 2005, pp 1150–1153 Barnard M, Odobez J (2005) Sports event recognition using layered HMMs. In: Proceedings of ICME 2005, pp 1150–1153
8.
Zurück zum Zitat Davenport G, Smith TA, Pincever N (1991) Cinematic primitives for multimedia. IEEE Comput Graph Appl 11(4):67–74CrossRef Davenport G, Smith TA, Pincever N (1991) Cinematic primitives for multimedia. IEEE Comput Graph Appl 11(4):67–74CrossRef
10.
Zurück zum Zitat Gemmell DJ, Vin HM, Kandlur DD, Rangan PV, Rowe LA (1995) Multimedia storage servers: a tutorial. IEEE Comput 28(5):40–49CrossRef Gemmell DJ, Vin HM, Kandlur DD, Rangan PV, Rowe LA (1995) Multimedia storage servers: a tutorial. IEEE Comput 28(5):40–49CrossRef
11.
Zurück zum Zitat Gunawardana A, Mahajan M, Acero A, Platt JC (2005) Hidden conditional random fields for phone classification. In: Proceedings of INTERSPEECH 2005, pp 1117–1120 Gunawardana A, Mahajan M, Acero A, Platt JC (2005) Hidden conditional random fields for phone classification. In: Proceedings of INTERSPEECH 2005, pp 1117–1120
12.
Zurück zum Zitat He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRef He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRef
14.
Zurück zum Zitat Jiang YG, Bhattacharya S, Chang SF, Shah M (2013) High-level event recognition in unconstrained videos. Int J Multimed Inf Retr 2(2):73–101CrossRef Jiang YG, Bhattacharya S, Chang SF, Shah M (2013) High-level event recognition in unconstrained videos. Int J Multimed Inf Retr 2(2):73–101CrossRef
15.
Zurück zum Zitat Lafferty JD, McCallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML 2001, pp 282–289 Lafferty JD, McCallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML 2001, pp 282–289
16.
Zurück zum Zitat Li W, Yu Q, Divakaran A, Vasconcelos N (2013) Dynamic pooling for complex event recognition. In: Proceedings of ICCV 2013, pp 2728–2735 Li W, Yu Q, Divakaran A, Vasconcelos N (2013) Dynamic pooling for complex event recognition. In: Proceedings of ICCV 2013, pp 2728–2735
17.
Zurück zum Zitat Li X, Snoek CGM (2009) Visual categorization with negative examples for free. In: Proceedings of MM 2009, pp 661–664 Li X, Snoek CGM (2009) Visual categorization with negative examples for free. In: Proceedings of MM 2009, pp 661–664
18.
Zurück zum Zitat Liu J, McCloskey S, Liu Y (2012) Local expert forest of score fusion for video event classification. In: Proceedings of ECCV 2012, pp 397–410 Liu J, McCloskey S, Liu Y (2012) Local expert forest of score fusion for video event classification. In: Proceedings of ECCV 2012, pp 397–410
20.
Zurück zum Zitat Naphade M et al (2006) Large-scale concept ontology for multimedia. IEEE Multimed 13(3):86–91CrossRef Naphade M et al (2006) Large-scale concept ontology for multimedia. IEEE Multimed 13(3):86–91CrossRef
21.
Zurück zum Zitat Quattoni A, Wang S, Morency L, Collins M, Darrell T (2007) Hidden conditional random fields. IEEE Trans Pattern Anal Mach Intell 29(10):1848–1852CrossRef Quattoni A, Wang S, Morency L, Collins M, Darrell T (2007) Hidden conditional random fields. IEEE Trans Pattern Anal Mach Intell 29(10):1848–1852CrossRef
22.
Zurück zum Zitat Rui Y, Huang TS, Mehrotra S (1999) Constructing table-of-content for videos. Multimed Syst 7(5):359–368CrossRef Rui Y, Huang TS, Mehrotra S (1999) Constructing table-of-content for videos. Multimed Syst 7(5):359–368CrossRef
23.
Zurück zum Zitat Shirahama K, Uehara K (2008) A novel topic extraction method based on bursts in video streams. Int J Hybrid Inf Technol 1(3):21–32 Shirahama K, Uehara K (2008) A novel topic extraction method based on bursts in video streams. Int J Hybrid Inf Technol 1(3):21–32
25.
Zurück zum Zitat Smeaton AF, Over P, Kraaij W (2006) Evaluation campaigns and TRECVid. In: Proceedings of MIR 2006, pp 321–330 Smeaton AF, Over P, Kraaij W (2006) Evaluation campaigns and TRECVid. In: Proceedings of MIR 2006, pp 321–330
26.
Zurück zum Zitat Smucker MD, Allan J, Carterette B (2007) A comparison of statistical significance tests for information retrieval evaluation. In: Proceedings of CIKM 2007, pp 623–632 Smucker MD, Allan J, Carterette B (2007) A comparison of statistical significance tests for information retrieval evaluation. In: Proceedings of CIKM 2007, pp 623–632
27.
Zurück zum Zitat Snoek CGM, Worring M (2009) Concept-based video retrieval. Found Trends Inf Retr 2(4):215–322CrossRef Snoek CGM, Worring M (2009) Concept-based video retrieval. Found Trends Inf Retr 2(4):215–322CrossRef
28.
Zurück zum Zitat Strassel, S et al. (2012) Creating HAVIC: heterogeneous audio visual internet collection. In: Proceedings of LREC 2012, pp 2573–2577 Strassel, S et al. (2012) Creating HAVIC: heterogeneous audio visual internet collection. In: Proceedings of LREC 2012, pp 2573–2577
29.
Zurück zum Zitat Sun C, Nevatia R (2013) ACTIVE: activity concept transitions in video event classification. In: Proceedings of ICCV 2013, pp 913–920 Sun C, Nevatia R (2013) ACTIVE: activity concept transitions in video event classification. In: Proceedings of ICCV 2013, pp 913–920
30.
Zurück zum Zitat Tanaka K, Ariki Y, Uehara K (1999) Organization and retrieval of video data. IEICE Trans Inf Syst 82(1):34–44 Tanaka K, Ariki Y, Uehara K (1999) Organization and retrieval of video data. IEICE Trans Inf Syst 82(1):34–44
31.
Zurück zum Zitat Vahdat A, Cannons K, Mori G, Oh S, Kim I (2013) Compositional models for video event detection: a multiple kernel learning latent variable approach. In: Proceedings of ICCV 2013, pp 1185– 1192 Vahdat A, Cannons K, Mori G, Oh S, Kim I (2013) Compositional models for video event detection: a multiple kernel learning latent variable approach. In: Proceedings of ICCV 2013, pp 1185– 1192
32.
Zurück zum Zitat Wang SB, Quattoni A, Morency L, Demirdjian D, Darrell T (2006a) Hidden conditional random fields for gesture recognition. In: Proceedings of CVPR 2006, pp 1521–1527 Wang SB, Quattoni A, Morency L, Demirdjian D, Darrell T (2006a) Hidden conditional random fields for gesture recognition. In: Proceedings of CVPR 2006, pp 1521–1527
33.
Zurück zum Zitat Wang T, Li J, Diao Q, Hu W, Zhang Y, Dulong C (2006b) Semantic event detection using conditional random fields. In: Proceedings of CVPRW 2006 Wang T, Li J, Diao Q, Hu W, Zhang Y, Dulong C (2006b) Semantic event detection using conditional random fields. In: Proceedings of CVPRW 2006
34.
Zurück zum Zitat Yin J, Hu DH, Yang Q (2009) Spatio-temporal event detection using dynamic conditional random fields. In: Proceedings of IJCAI 2009, pp 1321–1326 Yin J, Hu DH, Yang Q (2009) Spatio-temporal event detection using dynamic conditional random fields. In: Proceedings of IJCAI 2009, pp 1321–1326
36.
Zurück zum Zitat Yu H, Han J, Chang KC (2004) PEBL: Web page classification without negative examples. IEEE Trans Knowl Data Eng 16(1):70–81CrossRef Yu H, Han J, Chang KC (2004) PEBL: Web page classification without negative examples. IEEE Trans Knowl Data Eng 16(1):70–81CrossRef
37.
Zurück zum Zitat Zhai Y, Rasheed Z, Shah M (2004) A framework for semantic classification of scenes using finite state machines. In: Proceedings of CIVR 2004, pp 279–288 Zhai Y, Rasheed Z, Shah M (2004) A framework for semantic classification of scenes using finite state machines. In: Proceedings of CIVR 2004, pp 279–288
38.
Metadaten
Titel
Weakly supervised detection of video events using hidden conditional random fields
verfasst von
Kimiaki Shirahama
Marcin Grzegorzek
Kuniaki Uehara
Publikationsdatum
01.03.2015
Verlag
Springer London
Erschienen in
International Journal of Multimedia Information Retrieval / Ausgabe 1/2015
Print ISSN: 2192-6611
Elektronische ISSN: 2192-662X
DOI
https://doi.org/10.1007/s13735-014-0068-6

Weitere Artikel der Ausgabe 1/2015

International Journal of Multimedia Information Retrieval 1/2015 Zur Ausgabe