nach oben

Erschienen in:

01.01.2014 | Special Issue Paper

E-LAMP: integration of innovative ideas for multimedia event detection

verfasst von: Wei Tong, Yi Yang, Lu Jiang, Shoou-I Yu, ZhenZhong Lan, Zhigang Ma, Waito Sze, Ehsan Younessian, Alexander G. Hauptmann

Erschienen in: Machine Vision and Applications | Ausgabe 1/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Detecting multimedia events in web videos is an emerging hot research area in the fields of multimedia and computer vision. In this paper, we introduce the core methods and technologies of the framework we developed recently for our Event Labeling through Analytic Media Processing (E-LAMP) system to deal with different aspects of the overall problem of event detection. More specifically, we have developed efficient methods for feature extraction so that we are able to handle large collections of video data with thousands of hours of videos. Second, we represent the extracted raw features in a spatial bag-of-words model with more effective tilings such that the spatial layout information of different features and different events can be better captured, thus the overall detection performance can be improved. Third, different from widely used early and late fusion schemes, a novel algorithm is developed to learn a more robust and discriminative intermediate feature representation from multiple features so that better event models can be built upon it. Finally, to tackle the additional challenge of event detection with only very few positive exemplars, we have developed a novel algorithm which is able to effectively adapt the knowledge learnt from auxiliary sources to assist the event detection. Both our empirical results and the official evaluation results on TRECVID MED’11 and MED’12 demonstrate the excellent performance of the integration of these ideas.

Vorheriger Artikel Special issue on Multimedia Event Detection

Nächster Artikel Evaluating multimedia features and fusion for example-based event detection

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Adam, A., Rivlin, E., Shimshoni, I., Reinitz, D.: Robust real-time unusual event detection using multiple fixed-location monitors. IEEE Trans. Pattern Anal. Mach. Intell. 30(3), 555–560 (2008)CrossRef

Akbacak, M., Bolles, R.C., Burns, J.B., Eliot, M., Heller, A., Herson, J.A., Myers, G.K., Nallapati, R., Pancoast, S., Hout, J.V., Yeh, E., Habibian, A., Koelma, D.C., Li, Z., Mazloom, M., Pintea, S., van de Sande, K.E., Smeulders, A.W., Snoek, C.G., Lee, S.C., Revatia, R., Sharma, P., Sun, C., Trichet, R.: The 2012 sesame multimedia event detection (med) system. In: TRECVID (2012)

Ayache, S., Quénot, G., Gensel, J.: Classifier fusion for svm-based multimedia semantic indexing. In: Advances in Information Retrieval, pp. 494–504. Springer, Berlin (2007)

Ballas, N., Delezoide, B., Prêteux, F.: Trajectories based descriptor for dynamic events annotation. In: Proceedings of the 2011 Joint ACM Workshop on Modeling and Representing Events, pp. 13–18. ACM, New York (2011)

Bao, L., Zhang, L., Yu, S.I., zhong Lan, Z., Jiang, L., Overwijk, A., Jin, Q., Takahashi, S., Langner, B., Li, Y., Garbus, M., Florian Metze, S.B., Hauptmann, A.: Informedia @ trecvid2011. In: TRECVID (2011)

Brown, G.J.: Computational auditory scene analysis: a representational approach (1992)

Chaudhuri, S., Harvilla, M., Raj, B.: Unsupervised learning of acoustic unit descriptors for audio content representation and classification. In: Interspeech (2011)

Chen, M., Hauptmann, A.: Mosift: Recognizing human actions in surveillance videos. Techical report, Carnegie Mellon University (2009)

Cheng, H., Liu, J., Ali, S., Javed, O., Yu, Q., Tamrakar, A., Divakaran, A., Sawhney, H.S., Manmatha, R., Allan, J., Hauptmann, A., Shah, M., Bhattacharya, S., Dehghan, A., Friedland, G., Elizalde, B.M., Darrell, T., Witbrock, M., Curtis, J.: Sri-sarnoff aurora system at trecvid 2012 multimedia event detection and recounting. In: TRECVID (2012)

10.

Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, vol. 1(2004)

11.

Lan, Z., Bao, L., Yu, S.I., Liu, W., Hauptmann, A.G.: Double fusion for multimedia event detection. In: MMM (2012)

12.

Gehler, P., Nowozin, S.: On feature combination for multiclass object classification. In: IEEE 12th International Conference on Computer Vision, 2009, pp. 221–228. IEEE, New York (2009)

13.

Burghouts, G.J., Geusebroek, J.M.: Performance evaluation of local color invariants. In: CVIU (2009)

14.

Hill, M., Hua, G., Natsev, A., Smith, J.R., Xie, L., Huang, B., Merler, M., Ouyang, H., Zhou, M.: Ibm research trecvid-2010 video copy detection and multimedia event detection system. In: TRECVID (2010)

15.

Inoue, N., Shinoda, K.: A fast map adaptation technique for gmm-supervector-based video semantic indexing systems. In: Proceedings of the 19th ACM international conference on Multimedia, pp. 1357–1360. ACM, New York (2011)

16.

Jiang, L., Hauptmann, A., Xiang, G.: Leveraging high-level and low-level features for multimedia event detection. In: ACM Multimedia (2012)

17.

Jiang, Y.G., Zeng, X., Ye, G., Ellis, D., Chang, S.F.: Columbia-ucf trecvid2010 multimedia event detection: Combining multiple modalities, contextual concepts, and temporal matching. In: TRECVID (2010)

18.

Lan, Z.Z., Bao, L., Yu, S.I., Liu, W., Hauptmann, A.G.: Multimedia classification and event detection using double fusion. In: Multimedia Tools and Applications pp. 1–15 (2013)

19.

Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2006, vol. 2, pp. 2169–2178. IEEE, New York (2006)

20.

Li, H., Bao, L., Gao, Z., Overwijk, A., Liu, W., fei Zhang, L., Yu, S.I., yu Chen, M., Metze, F., Hauptmann, A.: Informedia @ trecvid2010. In: TRECVID (2010)

21.

Li, L.J., Su, H., Xing, E.P., Fei-Fei, L.: Object bank: A high-level image representation for scene classification and semantic feature sparsification. Adv. Neural Inf. Process. Syst. 24 (2010)

22.

Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)CrossRef

23.

Luo, J., Yu, J., Joshi, D., Hao, W.: Event recognition: viewing the world with a third eye. In: ACM Multimedia (2008)

24.

Ma, Z., Yang, Y., Cai, Y., Sebe, N., Hauptmann, A.: Knowledge adaptation for ad hoc multimedia event detection with few exemplars. In: ACM MM (2012)

25.

Ma, Z., Yang, Y., Sebe, N., Hauptmann, A.: Multimedia event detection using a classifier-specific intermediate representation. IEEE Trans. Multimedia (2013)

26.

Makkonen, J., Kerminen, R., Curcio, I.D., Mate, S., Visa, A.: Detecting events by clustering videos from large media databases. In: Proceedings of the 2nd ACM International Workshop on Events in Multimedia, pp. 9–14. ACM, New York (2010)

27.

Mertens, R., Lei, H., Gottlieb, L., Friedland, G., Divakaran, A.: Acoustic super models for large scale video event detection. In: Proceedings of the 2011 Joint ACM Workshop on Modeling and Representing events, pp. 19–24. ACM, New York (2011)

28.

Mezaris, V., Scherp, A., Jain, R., Kankanhalli, M., Zhou, H., Zhang, J., Wang, L., Zhang, Z.: Modeling and representing events in multimedia. In: Proceedings of the 19th ACM International Conference on Multimedia, pp. 613–614. ACM, New York (2011)

29.

Natarajan, P., Natarajan, P., Manohar, V., Wu, S., Tsakalidis, S., Vitaladevuni, S.N., Zhuang, X., Prasad, R.: Bbn viser trecvid 2011 multimedia event detection system. In: TRECVID (2011)

30.

Natarajan, P., Wu, S., Vitaladevuni, S., Zhuang, X., Tsakalidis, S., Park, U., Prasad, R.: Multimodal feature fusion for robust event detection in web videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012, pp. 1298–1305. IEEE, New York (2012)

31.

Over, P., et al.: Trecvid 2010—an introduction to the goals, tasks, data, evaluation mechanisms, and metrics. In: TRECVID (2010)

32.

Perera, A., Oh, S., Leotta, M., Kim, I., Byun, B., Lee, C., McCloskey, S., Liu, J., Miller, B., Huang, Z., Vahdat, A., Yang, W., Mori, G., Tang, K., Koller, D., Fei-Fei, L., Li, K., Chen, G., Corso, J., Fu, Y., Srihari, R.: Genie trecvid 2011 multimedia event detection: late-fusion approaches to combine multiple audio-visual features. In: TRECVID (2011)

33.

Sadlier, D.A., O’Connor, N.E.: Event detection in field sports video using audio-visual features and a support vector machine. IEEE Trans. Circuits Syst. Video Technol. 15(10), 1225–1233 (2005)CrossRef

34.

van de Sande, K.E.A., Gevers, T., Snoek, C.G.M.: Evaluating color descriptors for object and scene recognition. TPAMI (2010)

35.

Schölkopf, B., Smola, A.J.: Learning With Kernels: Support Vector Machines, Regularization, Optimization and Beyond. The MIT Press, Cambridge (2002)

36.

Shyu, M.L., Xie, Z., Chen, M., Chen, S.C.: Video semantic event/concept detection using a subspace-based multimedia data mining framework. Trans. Multimedia (2008)

37.

Snoek, C.G., Worring, M., Smeulders, A.W.: Early versus late fusion in semantic video analysis. In: Proceedings of the 13th Annual ACM International Conference on Multimedia, pp. 399–402. ACM, New York (2005)

38.

Snoek, C.G.M., Worring, M., van Gemert, J.C., Geusebroek, J.M., Smeulders, A.W.M.: The challenge problem for automated detection of 101 semantic concepts in multimedia. In: ACM Multimedia (2006)

39.

Tamrakar, A., Ali, S., Yu, Q., Liu, J., Javed, O., Divakaran, A., Cheng, H., Sawhney, H.: Evaluation of low-level features and their combinations for complex event detection in open source videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012, pp. 3681–3688. IEEE, New York (2012)

40.

Viitaniemi, V., Laaksonen, J.: Spatial extensions to bag of visual words. In: ACM CIVR (2009)

41.

Wang, G., Chua, T.S., Zhao, M.: Exploring knowledge of sub-domain in a multi-resolution bootstrapping framework for concept detection in news video. In: Proceedings of the 16th ACM International Conference on Multimedia, pp. 249–258. ACM, New York (2008)

42.

Wang, H., Klaser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR (2011)

43.

Willems, G., Tuytelaars, T., Gool, L.V.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: ECCV (2008)

44.

Xu, C., Wang, J., Wan, K., Li, Y., Duan, L.: Live sports event detection based on broadcast video and web-casting text. In: Proceedings of the 14th Annual ACM International Conference on Multimedia, pp. 221–230. ACM, New York (2006)

45.

Yang, J., Tong, W., Hauptmann, A.: A framework for classifier adaptation for large-scale multimedia data. Proc. IEEE (2012)

46.

Yang, Y., Ma, Z., Hauptmann, A.G., Sebe., N.: Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Trans. Multimedia (2013)

47.

Younessian, E., Quinn, M., Mitamura, T., Hauptmann, A.: Multimedia event detection using visual concept signatures. In: SPIE (2013)

48.

Zhao, B., Fei-Fei, L., Xing, E.P.: Online detection of unusual events in videos via dynamic sparse coding. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011, pp. 3313–3320. IEEE, New York (2011)

49.

Zheng, F., Zhang, G., Song, Z.: Comparison of different implementations of mfcc. J. Comput. Sci. Technol. (2001)

Titel: E-LAMP: integration of innovative ideas for multimedia event detection
verfasst von: Wei Tong
Yi Yang
Lu Jiang
Shoou-I Yu
ZhenZhong Lan
Zhigang Ma
Waito Sze
Ehsan Younessian
Alexander G. Hauptmann
Publikationsdatum: 01.01.2014
Verlag: Springer Berlin Heidelberg
Erschienen in: Machine Vision and Applications / Ausgabe 1/2014
Print ISSN: 0932-8092
Elektronische ISSN: 1432-1769
DOI: https://doi.org/10.1007/s00138-013-0529-6

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 1/2014

Thermal cameras and applications: a survey

Multimedia event detection with multimodal feature fusion and temporal concept localization

Image-based magnification calibration for electron microscope

Active tracking and pursuit under different levels of occlusion: a two-layer approach

Action recognition using 3D DAISY descriptor

Evaluating the effect of diffuse light on photometric stereo reconstruction