Skip to main content
Log in

Multimedia maximal marginal relevance for multi-video summarization

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper we propose several novel algorithms for multi-video summarization. The first and essential algorithm, Video Maximal Marginal Relevance (Video-MMR), mimics the principle of a classical algorithm of text summarization, Maximal Marginal Relevance (MMR). Video-MMR rewards relevant keyframes and penalizes redundant keyframes, only relying on visual features. We extend Video-MMR to Audio Video Maximal Marginal Relevance (AV-MMR) by exploiting audio features. We also propose Balanced AV-MMR, which exploits additional semantic features, the balance between audio information and visual information, and the balance of temporal information in different videos of a set. The proposed algorithms are generic and suitable for summarizing various video genres in multi-video set by using multimodal information. Our series of MMR algorithms for multi-video summarization are proved to be effective by the large-scale subjective and objective evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Ajmal M, Ashraf M, Shakir M, Abbas Y, Shah F (2012) Video summarization: Techniques and classification. Comput Vision Graph :1–13

  2. Allen MJ, Weintraub L, Abrams BS (2008) Forensic vision with application to highway safety. Lawyers & Judges Publishing

  3. Barbieri M, Agnihotri L, Dimitrova N (2003) Video summarization: methods and landscape. Internet multimedia management systems IV. In: Smith JR, Panchanathan S, Zhang T (eds) Proceedings of the SPIE

  4. Carbonell J, Goldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of ACM SIGIR conference. Melbourne Australia

  5. Chiu P, Girgensohn A, PolakW, Rieffel E,Wilcox L (2000) A genetic algorithm for video segmentation and summarization. In: IEEE international conference on multimedia and expo, ICME 2000, vol 3. IEEE, pp 1329–1332

  6. Cong Y, Yuan J, Luo J (2012) Towards scalable summarization of consumer videos via sparse dictionary selection. Multimed IEEE Trans 14(1):66–75

    Article  Google Scholar 

  7. Dale K, Shechtman E, Avidan S, Pfister H (2012) Multi-video browsing and summarization. In: IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW). IEEE, pp 1–8

  8. Das D, Martins AF (2007) A survey on automatic text summarization. Tech. rep., Literature Survey for the Language and Statistics II course at CMU

  9. de Avila SEF, Lopes APB et al (2011) Vsumm: A mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recogn Lett 32(1):56–68

    Article  Google Scholar 

  10. Delacourt P, Wellekens CJ (2000) Distbic: a speaker-based segmentation for audio data indexing. Speech Commun 32(1):111–126

    Article  Google Scholar 

  11. Dimitrova N (2004) Context and memory in multimedia content analysis. IEEE Multimedia 11:7–11

    Article  Google Scholar 

  12. Ding D, Metze F, Rawat S, Schulam P, Burger S, Younessian E, Bao L, Christel M, Hauptmann A (2012) Beyond audio and video retrieval: towards multimedia summarization. In: Proceedings of the 2nd ACM international conference on multimedia retrieval. ACM, p 2

  13. Dreyfus HL, Drey-fus SE, Zadeh LA (1987) Mind over machine: The power of human intuition and expertise in the era of the computer. IEEE Expert 2(2):110–111

    Article  Google Scholar 

  14. Dumont E, Merialdo B (2008) Automatic evaluation method for rushes summary content. In: Proceedings of international workshop on content-based multimedia indexing. London, pp 451–457

  15. Ejaz N, Mehmood I, Wook Baik S (2012) Efficient visual attention based framework for extracting key frames from videos. Signal Processing: Image Communication

  16. Ejaz N, Tariq TB, Baik SW (2012) Adaptive key frame extraction for video summarization using an aggregation mechanism. J Vis Communi Image Represent 23(7):1031–1040

    Article  Google Scholar 

  17. Fraternali P, Martinenghi D, Tagliasacchi M (2012) Top-k bounded diversification. In: Proceedings of the 2012 international conference on management of data. ACM, pp 421–432

  18. Furini M, Ghini V (2006) An audio-video summarization scheme based on audio and video analysis. Consumer Communications and Networking Conference

  19. Gao S, Tsang I, Chia L (2010) Kernel sparse representation for image classification and face recognition. Comput Vision–ECCV 2010:1–14

    Google Scholar 

  20. Haroz S, Whitney D (2012) How capacity limits of attention influence information visualization effectiveness. IEEE Trans Vis Comput Graph 18(12):2402–2410. http://dblp.uni-trier.de/db/journals/tvcg/tvcg18.html#HarozW12

    Article  Google Scholar 

  21. He L, Sanocki E, Gupta A, Grudin J (1999) Auto-summarization of audio-video presentations. In: Proceedings of the seventh ACM international conference on Multimedia (Part 1). ACM, pp 489–498

  22. Jiang W, Cotton C, Loui A (2011) Automatic consumer video summarization by audio and visual analysis. In: IEEE international conference on multimedia and expo (ICME). IEEE, pp 1–6

  23. Kemp T, Schmidt M, Westphal M, Waibel A (2000) Strategies for automatic segmentation of audio data. In: IEEE international conference on acoustics, speech, and signal processing, 2000. ICASSP’00. Proceedings, vol 3. IEEE, pp 1423–1426

  24. Kumar M, Loui A (2011) Key frame extraction from consumer videos using sparse representation. In: 18th IEEE international conference on image processing (ICIP). IEEE, pp 2437–2440

  25. Lee H, Battle A, Raina R, Ng A (2007) Efficient sparse coding algorithms. Adv Neural Inf Process Syst 19:801

    Google Scholar 

  26. Lew M, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: state of the art and challenges. ACM Transactions on Multimedia Computing

  27. Li Y, Merialdo B (2010) Multi-video summarization based on Video-MMR. In: Proceedings of 11th international workshop on image analysis for multimedia interactive services. Desenzano del Garda, Italy

    Google Scholar 

  28. Li Y, Merialdo B (2012) Multi-video summarization based on Balanced AV-MMR. In: Proceedings of The 18th international conference on multimedia modeling. Klagenfurt, Austria

  29. Li Y, Merialdo B, Rouvier M, Linares G (2011) Static and dynamic video summaries. In: Proceedings of the 19th ACM international conference on multimedia. ACM, pp 1573–1576

  30. Lienhart R, Pfeiffer S, Effelsberg W (1997) Video abstracting. Commun ACM 40(12):55–62

    Article  Google Scholar 

  31. Lin CY (2004) ROUGE: a package for automatic evaluation of summaries. In: proceedings of the workshop on text summarization branches out (WAS), Barcelona, p 2004

  32. Lin K, Lee A, Yang Y, Lee C, Chen H (2011) Automatic highlights extraction for drama video using music emotion and human face features. In: IEEE 13th international workshop on multimedia signal processing (MMSP). IEEE, pp 1–6

  33. Liu W, Tao D (2013) Multiview hessian regularization for image annotation. IEEE Trans Image Process 22(7):2676–2687

    Article  MathSciNet  Google Scholar 

  34. Liu W, Tao D, Cheng J, Tang Y (2014) Multiview hessian discriminative sparse coding for image annotation. Comput Vision Image Underst 118:50–60

    Article  Google Scholar 

  35. Ma Y, Hua X, Lu L, Zhang H (2005) A generic framework of user attention model and its application in video summarization. IEEE Trans Multimed 7:907–919

    Article  Google Scholar 

  36. Ma YF, Lu L, Zhang HJ, Li M (2002) A user attention model for video summarization. In: Proceedings of the tenth ACM international conference on multimedia. ACM, pp 533–542

  37. Mahmoud KM, Ismail MA, Ghanem NM (2013) Vscan: an enhanced video summarization using density-based spatial clustering. In: Image analysis and processing–ICIAP 2013. Springer, pp 733–742

  38. Marois R, Ivanoff J (2005) Capacity limits of information processing in the brain. Trends Cogn Sci 9(6):296–305

    Article  Google Scholar 

  39. McDonald R (2007) A study of global inference algorithms in multi-document summarization. Adv Inf Retr:557–564

  40. Mckeown K, Passonneau J R, Elson K D (1998) Do summaries help? A task-based evaluation of multi-document summarization. In: Proceedings of ACM SIGIR conference. Melbourne Australia

  41. Money AG (2007) Agius, H., Video summarisation: A conceptual framework and survey of the state of the art. J Vis Commun Image Represent

  42. Nilsson M, Nordberg J, Claesson I (2007) Face detection using local smqt features and split up snow classifier. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing

  43. Over P, Smeaton AF, Kelly P (2007) The trecvid 2007 bbc rushes summarization evaluation pilot. In: Proceedings of ACM MM’07. Augsburg, Bavaria, Germany

  44. Peng W, Chu W, Chang C, Chou C, Huang W, Chang W, Hung Y (2011) Editing by viewing: automatic home video summarization by viewing behavior analysis. IEEE Trans Multimed 13(3):539–550

    Article  Google Scholar 

  45. Rudinac S, Larson M, Hanjalic A (2013) Learning crowdsourced user preferences for visual summarization of image collections

  46. Shapiro KE (2001) The limits of attention: temporal constraints in human information processing. Oxford University Press

  47. Smeaton AF, Over P, Kraaij W (2006) Evaluation campaigns and TRECVid. In: Proceedings of the 8th ACM international workshop on multimedia information retrieval. ACM Press, New York, pp 321–330. doi:10.1145/1178677.1178722

    Google Scholar 

  48. Sugano M, Nakajima Y, Yanagihara H (2002) Automated MPEG audio-video summarization and description. In: Proceedings of the international conference on image processing. New York

  49. Truong BT, Venkatesh S (2007) Video abstraction: a systematic review and classification. ACM Trans Multimed Comput Commun Appl 3

  50. University of Cambridge HTK toolkit. http://htk.eng.cam.ac.uk

  51. Video Retrieval Group City U. of Hong Kong: local interest point extraction toolkit. http://vireo.cs.cityu.edu.hk

  52. Wactlar HD (2001) Multi-document summarization and visualization in the informedia digital video library. In: Proceedings of the 12th new information technology conference. Beijing, China

  53. Wang F, Merialdo B (2009) Multi-document video summarization. In: Proceedings of international conference on multimedia and expo. New York, USA

  54. Wang Z, Kumar M, Luo J, Li B (2011) Sequence-kernel based sparse representation for amateur video summarization. In: Proceedings of the 2011 joint ACM workshop on Modeling and representing events. ACM, pp 31–36

  55. Xu C, Shao X, Maddags NC, Kankanhalli MS (2005) Automatic music video summarization based on audio-visual-text analysis and alignment. ACM SIGIR

  56. Xu C, Tao D, Xu C (2013) A survey on multi-view learning arXiv preprint. arXiv:1304.5634

  57. Yahiaoui I, Merialdo B, Huet B (2001) Automatic video summarization. Multimedia content-based indexing and retrieval

  58. Yang CC, Chen H, Hong K (2003) Visualization of large category map for internet browsing. Decis Support Syst 35(1):89–102

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yingbo Li.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Merialdo, B. Multimedia maximal marginal relevance for multi-video summarization. Multimed Tools Appl 75, 199–220 (2016). https://doi.org/10.1007/s11042-014-2287-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-014-2287-5

Keywords

Navigation