Multimedia maximal marginal relevance for multi-video summarization

Li, Yingbo; Merialdo, Bernard

doi:10.1007/s11042-014-2287-5

Multimedia maximal marginal relevance for multi-video summarization

Published: 10 October 2014

Volume 75, pages 199–220, (2016)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yingbo Li¹ &
Bernard Merialdo¹

456 Accesses
11 Citations
Explore all metrics

Abstract

In this paper we propose several novel algorithms for multi-video summarization. The first and essential algorithm, Video Maximal Marginal Relevance (Video-MMR), mimics the principle of a classical algorithm of text summarization, Maximal Marginal Relevance (MMR). Video-MMR rewards relevant keyframes and penalizes redundant keyframes, only relying on visual features. We extend Video-MMR to Audio Video Maximal Marginal Relevance (AV-MMR) by exploiting audio features. We also propose Balanced AV-MMR, which exploits additional semantic features, the balance between audio information and visual information, and the balance of temporal information in different videos of a set. The proposed algorithms are generic and suitable for summarizing various video genres in multi-video set by using multimodal information. Our series of MMR algorithms for multi-video summarization are proved to be effective by the large-scale subjective and objective evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization

Article Open access 07 August 2017

Query Focused Video Summarization: A Review

Multi-view Video Summarization

References

Ajmal M, Ashraf M, Shakir M, Abbas Y, Shah F (2012) Video summarization: Techniques and classification. Comput Vision Graph :1–13
Allen MJ, Weintraub L, Abrams BS (2008) Forensic vision with application to highway safety. Lawyers & Judges Publishing
Barbieri M, Agnihotri L, Dimitrova N (2003) Video summarization: methods and landscape. Internet multimedia management systems IV. In: Smith JR, Panchanathan S, Zhang T (eds) Proceedings of the SPIE
Carbonell J, Goldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of ACM SIGIR conference. Melbourne Australia
Chiu P, Girgensohn A, PolakW, Rieffel E,Wilcox L (2000) A genetic algorithm for video segmentation and summarization. In: IEEE international conference on multimedia and expo, ICME 2000, vol 3. IEEE, pp 1329–1332
Cong Y, Yuan J, Luo J (2012) Towards scalable summarization of consumer videos via sparse dictionary selection. Multimed IEEE Trans 14(1):66–75
Article Google Scholar
Dale K, Shechtman E, Avidan S, Pfister H (2012) Multi-video browsing and summarization. In: IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW). IEEE, pp 1–8
Das D, Martins AF (2007) A survey on automatic text summarization. Tech. rep., Literature Survey for the Language and Statistics II course at CMU
de Avila SEF, Lopes APB et al (2011) Vsumm: A mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recogn Lett 32(1):56–68
Article Google Scholar
Delacourt P, Wellekens CJ (2000) Distbic: a speaker-based segmentation for audio data indexing. Speech Commun 32(1):111–126
Article Google Scholar
Dimitrova N (2004) Context and memory in multimedia content analysis. IEEE Multimedia 11:7–11
Article Google Scholar
Ding D, Metze F, Rawat S, Schulam P, Burger S, Younessian E, Bao L, Christel M, Hauptmann A (2012) Beyond audio and video retrieval: towards multimedia summarization. In: Proceedings of the 2nd ACM international conference on multimedia retrieval. ACM, p 2
Dreyfus HL, Drey-fus SE, Zadeh LA (1987) Mind over machine: The power of human intuition and expertise in the era of the computer. IEEE Expert 2(2):110–111
Article Google Scholar
Dumont E, Merialdo B (2008) Automatic evaluation method for rushes summary content. In: Proceedings of international workshop on content-based multimedia indexing. London, pp 451–457
Ejaz N, Mehmood I, Wook Baik S (2012) Efficient visual attention based framework for extracting key frames from videos. Signal Processing: Image Communication
Ejaz N, Tariq TB, Baik SW (2012) Adaptive key frame extraction for video summarization using an aggregation mechanism. J Vis Communi Image Represent 23(7):1031–1040
Article Google Scholar
Fraternali P, Martinenghi D, Tagliasacchi M (2012) Top-k bounded diversification. In: Proceedings of the 2012 international conference on management of data. ACM, pp 421–432
Furini M, Ghini V (2006) An audio-video summarization scheme based on audio and video analysis. Consumer Communications and Networking Conference
Gao S, Tsang I, Chia L (2010) Kernel sparse representation for image classification and face recognition. Comput Vision–ECCV 2010:1–14
Google Scholar
Haroz S, Whitney D (2012) How capacity limits of attention influence information visualization effectiveness. IEEE Trans Vis Comput Graph 18(12):2402–2410. http://dblp.uni-trier.de/db/journals/tvcg/tvcg18.html#HarozW12
Article Google Scholar
He L, Sanocki E, Gupta A, Grudin J (1999) Auto-summarization of audio-video presentations. In: Proceedings of the seventh ACM international conference on Multimedia (Part 1). ACM, pp 489–498
Jiang W, Cotton C, Loui A (2011) Automatic consumer video summarization by audio and visual analysis. In: IEEE international conference on multimedia and expo (ICME). IEEE, pp 1–6
Kemp T, Schmidt M, Westphal M, Waibel A (2000) Strategies for automatic segmentation of audio data. In: IEEE international conference on acoustics, speech, and signal processing, 2000. ICASSP’00. Proceedings, vol 3. IEEE, pp 1423–1426
Kumar M, Loui A (2011) Key frame extraction from consumer videos using sparse representation. In: 18th IEEE international conference on image processing (ICIP). IEEE, pp 2437–2440
Lee H, Battle A, Raina R, Ng A (2007) Efficient sparse coding algorithms. Adv Neural Inf Process Syst 19:801
Google Scholar
Lew M, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: state of the art and challenges. ACM Transactions on Multimedia Computing
Li Y, Merialdo B (2010) Multi-video summarization based on Video-MMR. In: Proceedings of 11th international workshop on image analysis for multimedia interactive services. Desenzano del Garda, Italy
Google Scholar
Li Y, Merialdo B (2012) Multi-video summarization based on Balanced AV-MMR. In: Proceedings of The 18th international conference on multimedia modeling. Klagenfurt, Austria
Li Y, Merialdo B, Rouvier M, Linares G (2011) Static and dynamic video summaries. In: Proceedings of the 19th ACM international conference on multimedia. ACM, pp 1573–1576
Lienhart R, Pfeiffer S, Effelsberg W (1997) Video abstracting. Commun ACM 40(12):55–62
Article Google Scholar
Lin CY (2004) ROUGE: a package for automatic evaluation of summaries. In: proceedings of the workshop on text summarization branches out (WAS), Barcelona, p 2004
Lin K, Lee A, Yang Y, Lee C, Chen H (2011) Automatic highlights extraction for drama video using music emotion and human face features. In: IEEE 13th international workshop on multimedia signal processing (MMSP). IEEE, pp 1–6
Liu W, Tao D (2013) Multiview hessian regularization for image annotation. IEEE Trans Image Process 22(7):2676–2687
Article MathSciNet Google Scholar
Liu W, Tao D, Cheng J, Tang Y (2014) Multiview hessian discriminative sparse coding for image annotation. Comput Vision Image Underst 118:50–60
Article Google Scholar
Ma Y, Hua X, Lu L, Zhang H (2005) A generic framework of user attention model and its application in video summarization. IEEE Trans Multimed 7:907–919
Article Google Scholar
Ma YF, Lu L, Zhang HJ, Li M (2002) A user attention model for video summarization. In: Proceedings of the tenth ACM international conference on multimedia. ACM, pp 533–542
Mahmoud KM, Ismail MA, Ghanem NM (2013) Vscan: an enhanced video summarization using density-based spatial clustering. In: Image analysis and processing–ICIAP 2013. Springer, pp 733–742
Marois R, Ivanoff J (2005) Capacity limits of information processing in the brain. Trends Cogn Sci 9(6):296–305
Article Google Scholar
McDonald R (2007) A study of global inference algorithms in multi-document summarization. Adv Inf Retr:557–564
Mckeown K, Passonneau J R, Elson K D (1998) Do summaries help? A task-based evaluation of multi-document summarization. In: Proceedings of ACM SIGIR conference. Melbourne Australia
Money AG (2007) Agius, H., Video summarisation: A conceptual framework and survey of the state of the art. J Vis Commun Image Represent
Nilsson M, Nordberg J, Claesson I (2007) Face detection using local smqt features and split up snow classifier. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing
Over P, Smeaton AF, Kelly P (2007) The trecvid 2007 bbc rushes summarization evaluation pilot. In: Proceedings of ACM MM’07. Augsburg, Bavaria, Germany
Peng W, Chu W, Chang C, Chou C, Huang W, Chang W, Hung Y (2011) Editing by viewing: automatic home video summarization by viewing behavior analysis. IEEE Trans Multimed 13(3):539–550
Article Google Scholar
Rudinac S, Larson M, Hanjalic A (2013) Learning crowdsourced user preferences for visual summarization of image collections
Shapiro KE (2001) The limits of attention: temporal constraints in human information processing. Oxford University Press
Smeaton AF, Over P, Kraaij W (2006) Evaluation campaigns and TRECVid. In: Proceedings of the 8th ACM international workshop on multimedia information retrieval. ACM Press, New York, pp 321–330. doi:10.1145/1178677.1178722
Google Scholar
Sugano M, Nakajima Y, Yanagihara H (2002) Automated MPEG audio-video summarization and description. In: Proceedings of the international conference on image processing. New York
Truong BT, Venkatesh S (2007) Video abstraction: a systematic review and classification. ACM Trans Multimed Comput Commun Appl 3
University of Cambridge HTK toolkit. http://htk.eng.cam.ac.uk
Video Retrieval Group City U. of Hong Kong: local interest point extraction toolkit. http://vireo.cs.cityu.edu.hk
Wactlar HD (2001) Multi-document summarization and visualization in the informedia digital video library. In: Proceedings of the 12th new information technology conference. Beijing, China
Wang F, Merialdo B (2009) Multi-document video summarization. In: Proceedings of international conference on multimedia and expo. New York, USA
Wang Z, Kumar M, Luo J, Li B (2011) Sequence-kernel based sparse representation for amateur video summarization. In: Proceedings of the 2011 joint ACM workshop on Modeling and representing events. ACM, pp 31–36
Xu C, Shao X, Maddags NC, Kankanhalli MS (2005) Automatic music video summarization based on audio-visual-text analysis and alignment. ACM SIGIR
Xu C, Tao D, Xu C (2013) A survey on multi-view learning arXiv preprint. arXiv:1304.5634
Yahiaoui I, Merialdo B, Huet B (2001) Automatic video summarization. Multimedia content-based indexing and retrieval
Yang CC, Chen H, Hong K (2003) Visualization of large category map for internet browsing. Decis Support Syst 35(1):89–102
Article Google Scholar

Download references

Author information

Authors and Affiliations

EURECOM, Sophia Antipolis, France
Yingbo Li & Bernard Merialdo

Authors

Yingbo Li
View author publications
You can also search for this author in PubMed Google Scholar
Bernard Merialdo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yingbo Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Y., Merialdo, B. Multimedia maximal marginal relevance for multi-video summarization. Multimed Tools Appl 75, 199–220 (2016). https://doi.org/10.1007/s11042-014-2287-5

Download citation

Received: 26 November 2013
Revised: 13 August 2014
Accepted: 17 September 2014
Published: 10 October 2014
Issue Date: January 2016
DOI: https://doi.org/10.1007/s11042-014-2287-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multimedia maximal marginal relevance for multi-video summarization

Abstract

Access this article

Similar content being viewed by others

COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization

Query Focused Video Summarization: A Review

Multi-view Video Summarization

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multimedia maximal marginal relevance for multi-video summarization

Abstract

Access this article

Similar content being viewed by others

COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization

Query Focused Video Summarization: A Review

Multi-view Video Summarization

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation