Skip to main content
Erschienen in:
Buchtitelbild

2016 | OriginalPaper | Buchkapitel

Query-Focused Extractive Video Summarization

verfasst von : Aidean Sharghi, Boqing Gong, Mubarak Shah

Erschienen in: Computer Vision – ECCV 2016

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Video data is explosively growing. As a result of the “big video data”, intelligent algorithms for automatic video summarization have (re-)emerged as a pressing need. We develop a probabilistic model, Sequential and Hierarchical Determinantal Point Process (SH-DPP), for query-focused extractive video summarization. Given a user query and a long video sequence, our algorithm returns a summary by selecting key shots from the video. The decision to include a shot in the summary depends on the shot’s relevance to the user query and importance in the context of the video, jointly. We verify our approach on two densely annotated video datasets. The query-focused video summarization is particularly useful for search engines, e.g., to display snippets of videos.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
It is also appealing to have the summary as a spatial-temporal synopsis or mosaic composed of multiple frames. However, the compositional summarization is challenging and has achieved some success in only well-controlled environments [13].
 
Literatur
1.
Zurück zum Zitat Pritch, Y., Rav-Acha, A., Gutman, A., Peleg, S.: Webcam synopsis: peeking around the world. In: IEEE 11th International Conference on Computer Vision 2007, ICCV 2007, pp. 1–8. IEEE (2007) Pritch, Y., Rav-Acha, A., Gutman, A., Peleg, S.: Webcam synopsis: peeking around the world. In: IEEE 11th International Conference on Computer Vision 2007, ICCV 2007, pp. 1–8. IEEE (2007)
2.
Zurück zum Zitat Pal, C., Jojic, N.: Interactive montages of sprites for indexing and summarizing security video. In: IEEE Computer Society Conference on CVPR 2005, vol. 2. IEEE (2005) Pal, C., Jojic, N.: Interactive montages of sprites for indexing and summarizing security video. In: IEEE Computer Society Conference on CVPR 2005, vol. 2. IEEE (2005)
3.
Zurück zum Zitat Kang, H.W., Matsushita, Y., Tang, X., Chen, X.Q.: Space-time video montage. In: IEEE Computer Society Conference on CVPR 2006, vol. 2. IEEE (2006) Kang, H.W., Matsushita, Y., Tang, X., Chen, X.Q.: Space-time video montage. In: IEEE Computer Society Conference on CVPR 2006, vol. 2. IEEE (2006)
4.
Zurück zum Zitat Jiang, R.M., Sadka, A.H., Crookes, D.: Advances in video summarization and skimming. In: Grgic, M., Delac, K., Ghanbari, M. (eds.) Recent Advances in Multimedia Signal Processing and Communications. SCI, vol. 231, pp. 27–50. Springer, Heidelberg (2009)CrossRef Jiang, R.M., Sadka, A.H., Crookes, D.: Advances in video summarization and skimming. In: Grgic, M., Delac, K., Ghanbari, M. (eds.) Recent Advances in Multimedia Signal Processing and Communications. SCI, vol. 231, pp. 27–50. Springer, Heidelberg (2009)CrossRef
5.
Zurück zum Zitat Rav-Acha, A., Pritch, Y., Peleg, S.: Making a long video short: dynamic video synopsis. In: 2006 IEEE Computer Society Conference on CVPR, vol. 1. IEEE (2006) Rav-Acha, A., Pritch, Y., Peleg, S.: Making a long video short: dynamic video synopsis. In: 2006 IEEE Computer Society Conference on CVPR, vol. 1. IEEE (2006)
6.
Zurück zum Zitat Goldman, D.B., Curless, B., Salesin, D., Seitz, S.M.: Schematic storyboarding for video visualization and editing. ACM Trans. Graph. (TOG) 25, 862–871 (2006). ACMCrossRef Goldman, D.B., Curless, B., Salesin, D., Seitz, S.M.: Schematic storyboarding for video visualization and editing. ACM Trans. Graph. (TOG) 25, 862–871 (2006). ACMCrossRef
7.
Zurück zum Zitat Liu, T., Kender, J.R.: Optimization algorithms for the selection of key frame sequences of variable length. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 403–417. Springer, Heidelberg (2002)CrossRef Liu, T., Kender, J.R.: Optimization algorithms for the selection of key frame sequences of variable length. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 403–417. Springer, Heidelberg (2002)CrossRef
8.
Zurück zum Zitat Aner, A., Kender, J.R.: Video summaries through mosaic-based shot and scene clustering. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 388–402. Springer, Heidelberg (2002)CrossRef Aner, A., Kender, J.R.: Video summaries through mosaic-based shot and scene clustering. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 388–402. Springer, Heidelberg (2002)CrossRef
9.
Zurück zum Zitat Vasconcelos, N., Lippman, A.: A spatiotemporal motion model for video summarization. In: Proceedings of IEEE Computer Society Conference on CVPR 1998, pp. 361–366. IEEE (1998) Vasconcelos, N., Lippman, A.: A spatiotemporal motion model for video summarization. In: Proceedings of IEEE Computer Society Conference on CVPR 1998, pp. 361–366. IEEE (1998)
10.
Zurück zum Zitat Wolf, W.: Key frame selection by motion analysis. In: Proceedings of 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1996, vol. 2, pp. 1228–1231. IEEE (1996) Wolf, W.: Key frame selection by motion analysis. In: Proceedings of 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1996, vol. 2, pp. 1228–1231. IEEE (1996)
11.
Zurück zum Zitat Lee, K.M., Kwon, J.: A unified framework for event summarization and rare event detection. In: 2012 IEEE Conference on CVPR. IEEE (2012) Lee, K.M., Kwon, J.: A unified framework for event summarization and rare event detection. In: 2012 IEEE Conference on CVPR. IEEE (2012)
12.
Zurück zum Zitat Cong, Y., Yuan, J., Luo, J.: Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Trans. Multimedia 14(1), 66–75 (2012)CrossRef Cong, Y., Yuan, J., Luo, J.: Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Trans. Multimedia 14(1), 66–75 (2012)CrossRef
13.
Zurück zum Zitat Ngo, C., Ma, Y., Zhang, H.: Automatic video summarization by graph modeling. In: Proceedings of the Ninth IEEE International Conference on Computer Vision 2003. IEEE (2003) Ngo, C., Ma, Y., Zhang, H.: Automatic video summarization by graph modeling. In: Proceedings of the Ninth IEEE International Conference on Computer Vision 2003. IEEE (2003)
14.
Zurück zum Zitat Khosla, A., Hamid, R., Lin, C.J., Sundaresan, N.: Large-scale video summarization using web-image priors. In: Proceedings of the IEEE Conference on CVPR (2013) Khosla, A., Hamid, R., Lin, C.J., Sundaresan, N.: Large-scale video summarization using web-image priors. In: Proceedings of the IEEE Conference on CVPR (2013)
15.
Zurück zum Zitat Kim, G., Sigal, L., Xing, E.: Joint summarization of large-scale collections of web images and videos for storyline reconstruction. In: Proceedings of the IEEE Conference on CVPR (2014) Kim, G., Sigal, L., Xing, E.: Joint summarization of large-scale collections of web images and videos for storyline reconstruction. In: Proceedings of the IEEE Conference on CVPR (2014)
16.
Zurück zum Zitat Xiong, B., Grauman, K.: Detecting snap points in egocentric video with a web photo prior. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part V. LNCS, vol. 8693, pp. 282–298. Springer, Heidelberg (2014) Xiong, B., Grauman, K.: Detecting snap points in egocentric video with a web photo prior. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part V. LNCS, vol. 8693, pp. 282–298. Springer, Heidelberg (2014)
17.
Zurück zum Zitat Chu, W.S., Song, Y., Jaimes, A.: Video co-summarization: video summarization by visual co-occurrence. In: Proceedings of the IEEE Conference on CVPR (2015) Chu, W.S., Song, Y., Jaimes, A.: Video co-summarization: video summarization by visual co-occurrence. In: Proceedings of the IEEE Conference on CVPR (2015)
18.
Zurück zum Zitat Song, Y., Vallmitjana, J., Stent, A., Jaimes, A.: TVSum: summarizing web videos using titles. In: Proceedings of the IEEE Conference on CVPR (2015) Song, Y., Vallmitjana, J., Stent, A., Jaimes, A.: TVSum: summarizing web videos using titles. In: Proceedings of the IEEE Conference on CVPR (2015)
19.
Zurück zum Zitat Liu, W., Mei, T., Zhang, Y., Che, C., Luo, J.: Multi-task deep visual-semantic embedding for video thumbnail selection. In: Proceedings of the IEEE Conference on CVPR (2015) Liu, W., Mei, T., Zhang, Y., Che, C., Luo, J.: Multi-task deep visual-semantic embedding for video thumbnail selection. In: Proceedings of the IEEE Conference on CVPR (2015)
20.
Zurück zum Zitat Potapov, D., Douze, M., Harchaoui, Z., Schmid, C.: Category-specific video summarization. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VI. LNCS, vol. 8694, pp. 540–555. Springer, Heidelberg (2014) Potapov, D., Douze, M., Harchaoui, Z., Schmid, C.: Category-specific video summarization. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VI. LNCS, vol. 8694, pp. 540–555. Springer, Heidelberg (2014)
21.
Zurück zum Zitat Sun, M., Farhadi, A., Seitz, S.: Ranking domain-specific highlights by analyzing edited videos. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part I. LNCS, vol. 8689, pp. 787–802. Springer, Heidelberg (2014) Sun, M., Farhadi, A., Seitz, S.: Ranking domain-specific highlights by analyzing edited videos. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part I. LNCS, vol. 8689, pp. 787–802. Springer, Heidelberg (2014)
22.
Zurück zum Zitat Xu, J., Mukherjee, L., Li, Y., Warner, J., Rehg, J.M., Singh, V.: Gaze-enabled egocentric video summarization via constrained submodular maximization. In: Proceedings of the IEEE Conference on CVPR (2015) Xu, J., Mukherjee, L., Li, Y., Warner, J., Rehg, J.M., Singh, V.: Gaze-enabled egocentric video summarization via constrained submodular maximization. In: Proceedings of the IEEE Conference on CVPR (2015)
23.
Zurück zum Zitat Gygli, M., Grabner, H., Riemenschneider, H., Van Gool, L.: Creating summaries from user videos. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VII. LNCS, vol. 8695, pp. 505–520. Springer, Heidelberg (2014) Gygli, M., Grabner, H., Riemenschneider, H., Van Gool, L.: Creating summaries from user videos. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VII. LNCS, vol. 8695, pp. 505–520. Springer, Heidelberg (2014)
24.
Zurück zum Zitat Lu, Z., Grauman, K.: Story-driven summarization for egocentric video. In: Proceedings of the IEEE Conference on CVPR (2013) Lu, Z., Grauman, K.: Story-driven summarization for egocentric video. In: Proceedings of the IEEE Conference on CVPR (2013)
25.
Zurück zum Zitat Lee, Y.J., Grauman, K.: Predicting important objects for egocentric video summarization. Int. J. Comput. Vis. 114(1), 38–55 (2015)CrossRefMathSciNet Lee, Y.J., Grauman, K.: Predicting important objects for egocentric video summarization. Int. J. Comput. Vis. 114(1), 38–55 (2015)CrossRefMathSciNet
26.
Zurück zum Zitat Liu, D., Hua, G., Chen, T.: A hierarchical visual model for video object summarization. IEEE Trans. Pattern Anal. Mach. Intell. 32(12), 2178–2190 (2010)CrossRef Liu, D., Hua, G., Chen, T.: A hierarchical visual model for video object summarization. IEEE Trans. Pattern Anal. Mach. Intell. 32(12), 2178–2190 (2010)CrossRef
27.
Zurück zum Zitat Gong, B., Chao, W.L., Grauman, K., Sha, F.: Diverse sequential subset selection for supervised video summarization. In: Advances in Neural Information Processing Systems, pp. 2069–2077 (2014) Gong, B., Chao, W.L., Grauman, K., Sha, F.: Diverse sequential subset selection for supervised video summarization. In: Advances in Neural Information Processing Systems, pp. 2069–2077 (2014)
28.
Zurück zum Zitat Gygli, M., Grabner, H., Van Gool, L.: Video summarization by learning submodular mixtures of objectives. In: Proceedings of the IEEE Conference on CVPR (2015) Gygli, M., Grabner, H., Van Gool, L.: Video summarization by learning submodular mixtures of objectives. In: Proceedings of the IEEE Conference on CVPR (2015)
29.
Zurück zum Zitat Nenkova, A., McKeown, K.: A survey of text summarization techniques. In: Aggarwal, C.C., Zhai, C.X. (eds.) Mining Text Data, pp. 43–76. Springer, Heidelberg (2012)CrossRef Nenkova, A., McKeown, K.: A survey of text summarization techniques. In: Aggarwal, C.C., Zhai, C.X. (eds.) Mining Text Data, pp. 43–76. Springer, Heidelberg (2012)CrossRef
30.
31.
Zurück zum Zitat Ghosh, J., Lee, Y.J., Grauman, K.: Discovering important people and objects for egocentric video summarization. In: 2012 IEEE Conference on CVPR. IEEE (2012) Ghosh, J., Lee, Y.J., Grauman, K.: Discovering important people and objects for egocentric video summarization. In: 2012 IEEE Conference on CVPR. IEEE (2012)
32.
33.
Zurück zum Zitat Daumé III., H., Marcu, D.: Bayesian query-focused summarization. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics (2006) Daumé III., H., Marcu, D.: Bayesian query-focused summarization. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics (2006)
34.
Zurück zum Zitat Schilder, F., Kondadadi, R.: Fastsum: fast and accurate query-based multi-document summarization. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, pp. 205–208. Association for Computational Linguistics (2008) Schilder, F., Kondadadi, R.: Fastsum: fast and accurate query-based multi-document summarization. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, pp. 205–208. Association for Computational Linguistics (2008)
35.
Zurück zum Zitat Gupta, S., Nenkova, A., Jurafsky, D.: Measuring importance and query relevance in topic-focused multi-document summarization. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. Association for Computational Linguistics, pp. 193–196 (2007) Gupta, S., Nenkova, A., Jurafsky, D.: Measuring importance and query relevance in topic-focused multi-document summarization. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. Association for Computational Linguistics, pp. 193–196 (2007)
36.
Zurück zum Zitat Ellouze, M., Boujemaa, N., Alimi, A.M.: IM(S)\(^{2}\): interactive movie summarization system. J. Vis. Commun. Image Represent. 21(4), 283–294 (2010)CrossRef Ellouze, M., Boujemaa, N., Alimi, A.M.: IM(S)\(^{2}\): interactive movie summarization system. J. Vis. Commun. Image Represent. 21(4), 283–294 (2010)CrossRef
37.
Zurück zum Zitat Xiong, B., Kim, G., Sigal, L.: Storyline representation of egocentric videos with an applications to story-based search. In: Proceedings of the IEEE International CVPR (2015) Xiong, B., Kim, G., Sigal, L.: Storyline representation of egocentric videos with an applications to story-based search. In: Proceedings of the IEEE International CVPR (2015)
39.
Zurück zum Zitat Chao, W.L., Gong, B., Grauman, K., Sha, F.: Large-margin determinantal point processes. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) (2015) Chao, W.L., Gong, B., Grauman, K., Sha, F.: Large-margin determinantal point processes. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) (2015)
40.
41.
Zurück zum Zitat Borth, D., Chen, T., Ji, R., Chang, S.F.: Sentibank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content. In: Proceedings of the 21st ACM International Conference on Multimedia. ACM (2013) Borth, D., Chen, T., Ji, R., Chang, S.F.: Sentibank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content. In: Proceedings of the 21st ACM International Conference on Multimedia. ACM (2013)
42.
Zurück zum Zitat Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vsion 42(3), 145–175 (2001)CrossRefMATH Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vsion 42(3), 145–175 (2001)CrossRefMATH
43.
Zurück zum Zitat Ojala, T., Pietikäinen, M., Mäenpää, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 971–987 (2002)CrossRefMATH Ojala, T., Pietikäinen, M., Mäenpää, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 971–987 (2002)CrossRefMATH
44.
Zurück zum Zitat Yu, F., Cao, L., Feris, R., Smith, J., Chang, S.F.: Designing category-level attributes for discriminative visual recognition. In: Proceedings of the IEEE Conference on CVPR (2013) Yu, F., Cao, L., Feris, R., Smith, J., Chang, S.F.: Designing category-level attributes for discriminative visual recognition. In: Proceedings of the IEEE Conference on CVPR (2013)
45.
Zurück zum Zitat Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: Proceedings of the ACL-04 Workshop, Text Summarization Branches Out, vol. 8 (2004) Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: Proceedings of the ACL-04 Workshop, Text Summarization Branches Out, vol. 8 (2004)
46.
Zurück zum Zitat Zhao, B., Xing, E.: Quasi real-time summarization for consumer videos. In: Proceedings of the IEEE Conference on CVPR (2014) Zhao, B., Xing, E.: Quasi real-time summarization for consumer videos. In: Proceedings of the IEEE Conference on CVPR (2014)
Metadaten
Titel
Query-Focused Extractive Video Summarization
verfasst von
Aidean Sharghi
Boqing Gong
Mubarak Shah
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-46484-8_1