Skip to main content

2019 | OriginalPaper | Buchkapitel

Summarizing Videos with Attention

verfasst von : Jiri Fajtl, Hajar Sadeghi Sokeh, Vasileios Argyriou, Dorothy Monekosso, Paolo Remagnino

Erschienen in: Computer Vision – ACCV 2018 Workshops

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this work we propose a novel method for supervised, keyshots based video summarization by applying a conceptually simple and computationally efficient soft, self-attention mechanism. Current state of the art methods leverage bi-directional recurrent networks such as BiLSTM combined with attention. These networks are complex to implement and computationally demanding compared to fully connected networks. To that end we propose a simple, self-attention based network for video summarization which performs the entire sequence to sequence transformation in a single feed forward pass and single backward pass during training. Our method sets a new state of the art results on two benchmarks TvSum and SumMe, commonly used in this domain.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Argyriou, V.: Sub-hexagonal phase correlation for motion estimation. IEEE Trans. Image Process. 20(1), 110–120 (2011)MathSciNetCrossRef Argyriou, V.: Sub-hexagonal phase correlation for motion estimation. IEEE Trans. Image Process. 20(1), 110–120 (2011)MathSciNetCrossRef
2.
4.
Zurück zum Zitat Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014) Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:​1409.​0473 (2014)
5.
Zurück zum Zitat Cheng, J., Dong, L., Lapata, M.: Long short-term memory-networks for machine reading. In: Proceedings of the EMNLP, pp. 551–561 (2016) Cheng, J., Dong, L., Lapata, M.: Long short-term memory-networks for machine reading. In: Proceedings of the EMNLP, pp. 551–561 (2016)
6.
Zurück zum Zitat Cho, K., Merrienboer, B., Gulcehre, C., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the EMNLP (2014) Cho, K., Merrienboer, B., Gulcehre, C., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the EMNLP (2014)
7.
Zurück zum Zitat De Avila, S.E.F., Lopes, A.P.B., da Luz Jr., A., de Albuquerque Araújo, A.: VSUMM: a mechanism designed to produce static video summaries and a novel evaluation. Pattern Recogn. Lett. 32(1), 56–68 (2011)CrossRef De Avila, S.E.F., Lopes, A.P.B., da Luz Jr., A., de Albuquerque Araújo, A.: VSUMM: a mechanism designed to produce static video summaries and a novel evaluation. Pattern Recogn. Lett. 32(1), 56–68 (2011)CrossRef
8.
Zurück zum Zitat Fajtl, J., Argyriou, V., Monekosso, D., Remagnino, P.: AMNet: memorability estimation with attention. In: Proceedings of the IEEE CVPR, pp. 6363–6372 (2018) Fajtl, J., Argyriou, V., Monekosso, D., Remagnino, P.: AMNet: memorability estimation with attention. In: Proceedings of the IEEE CVPR, pp. 6363–6372 (2018)
9.
Zurück zum Zitat Fei, M., Jiang, W., Mao, W.: Memorable and rich video summarization. J. Vis. Commun. Image Represent. 42(C), 207–217 (2017)CrossRef Fei, M., Jiang, W., Mao, W.: Memorable and rich video summarization. J. Vis. Commun. Image Represent. 42(C), 207–217 (2017)CrossRef
10.
Zurück zum Zitat Gehring, J., et al.: Convolutional sequence to sequence learning. In: Proceedings of the ICML, pp. 1243–1252, 06–11 August 2017 Gehring, J., et al.: Convolutional sequence to sequence learning. In: Proceedings of the ICML, pp. 1243–1252, 06–11 August 2017
11.
Zurück zum Zitat Graves, A., et al.: Hybrid computing using a neural network with dynamic external memory. Nature 538(7626), 471 (2016)CrossRef Graves, A., et al.: Hybrid computing using a neural network with dynamic external memory. Nature 538(7626), 471 (2016)CrossRef
13.
Zurück zum Zitat Gygli, M., et al.: Video summarization by learning submodular mixtures of objectives. In: Proceedings of the IEEE CVPR, pp. 3090–3098 (2015) Gygli, M., et al.: Video summarization by learning submodular mixtures of objectives. In: Proceedings of the IEEE CVPR, pp. 3090–3098 (2015)
14.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
15.
Zurück zum Zitat Ji, Z., Xiong, K., Pang, Y., Li, X.: Video summarization with attention-based encoder-decoder networks. arXiv preprint arXiv:1708.09545 (2017) Ji, Z., Xiong, K., Pang, Y., Li, X.: Video summarization with attention-based encoder-decoder networks. arXiv preprint arXiv:​1708.​09545 (2017)
16.
Zurück zum Zitat Khosla, A., Raju, A.S., Torralba, A., Oliva, A.: Understanding and predicting image memorability at a large scale. In: Proceedings of the IEEE ICCV, pp. 2390–2398 (2015) Khosla, A., Raju, A.S., Torralba, A., Oliva, A.: Understanding and predicting image memorability at a large scale. In: Proceedings of the IEEE ICCV, pp. 2390–2398 (2015)
17.
Zurück zum Zitat Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the ICLR, vol. 5 (2015) Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the ICLR, vol. 5 (2015)
18.
Zurück zum Zitat Larkin, K.G.: Reflections on Shannon information: in search of a natural information-entropy for images. CoRR abs/1609.01117 (2016) Larkin, K.G.: Reflections on Shannon information: in search of a natural information-entropy for images. CoRR abs/1609.01117 (2016)
19.
Zurück zum Zitat Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-2004 Workshop, pp. 74–81. Association for Computational Linguistics, Barcelona, July 2004 Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-2004 Workshop, pp. 74–81. Association for Computational Linguistics, Barcelona, July 2004
20.
Zurück zum Zitat Lin, Z., et al.: A structured self-attentive sentence embedding. In: Proceedings of the ICLR (2017) Lin, Z., et al.: A structured self-attentive sentence embedding. In: Proceedings of the ICLR (2017)
21.
Zurück zum Zitat Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015) Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:​1508.​04025 (2015)
22.
Zurück zum Zitat Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Proceedings of the EMNLP (2015) Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Proceedings of the EMNLP (2015)
23.
Zurück zum Zitat Mahasseni, B., Lam, M., Todorovic, S.: Unsupervised video summarization with adversarial LSTM networks. In: Proceedings of the IEEE CVPR, pp. 2982–2991 (2017) Mahasseni, B., Lam, M., Todorovic, S.: Unsupervised video summarization with adversarial LSTM networks. In: Proceedings of the IEEE CVPR, pp. 2982–2991 (2017)
24.
Zurück zum Zitat Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: Proceedings of the NIPS, pp. 2204–2212 (2014) Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: Proceedings of the NIPS, pp. 2204–2212 (2014)
25.
Zurück zum Zitat Novak, C.L., Shafer, S.A.: Anatomy of a color histogram. In: Proceedings of the IEEE CVPR, pp. 599–605. IEEE (1992) Novak, C.L., Shafer, S.A.: Anatomy of a color histogram. In: Proceedings of the IEEE CVPR, pp. 599–605. IEEE (1992)
26.
Zurück zum Zitat Otani, M., et al.: Video summarization using deep semantic features. In: Proceedings of the ACCV, pp. 361–377 (2016)CrossRef Otani, M., et al.: Video summarization using deep semantic features. In: Proceedings of the ACCV, pp. 361–377 (2016)CrossRef
27.
Zurück zum Zitat Parikh, A., et al.: A decomposable attention model for natural language inference. In: Proceedings of the EMNLP, pp. 2249–2255 (2016) Parikh, A., et al.: A decomposable attention model for natural language inference. In: Proceedings of the EMNLP, pp. 2249–2255 (2016)
30.
Zurück zum Zitat Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.: Others: imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)MathSciNetCrossRef Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.: Others: imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)MathSciNetCrossRef
31.
Zurück zum Zitat dos Santos Belo, L., Caetano Jr., C.A., do Patrocínio Jr., Z.K.G., Guimarães, S.J.F.: Summarizing video sequence using a graph-based hierarchical approach. Neurocomputing 173, 1001–1016 (2016)CrossRef dos Santos Belo, L., Caetano Jr., C.A., do Patrocínio Jr., Z.K.G., Guimarães, S.J.F.: Summarizing video sequence using a graph-based hierarchical approach. Neurocomputing 173, 1001–1016 (2016)CrossRef
32.
Zurück zum Zitat Song, Y., Vallmitjana, J., Stent, A., Jaimes, A.: TVSum: summarizing web videos using titles. In: Proceedings of the IEEE CVPR, pp. 5179–5187 (2015) Song, Y., Vallmitjana, J., Stent, A., Jaimes, A.: TVSum: summarizing web videos using titles. In: Proceedings of the IEEE CVPR, pp. 5179–5187 (2015)
33.
Zurück zum Zitat Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE CVPR, pp. 1–9 (2015) Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE CVPR, pp. 1–9 (2015)
34.
Zurück zum Zitat Vaswani, A., et al.: Attention is all you need. In: Proceedings of the NIPS, pp. 5998–6008. Curran Associates, Inc. (2017) Vaswani, A., et al.: Attention is all you need. In: Proceedings of the NIPS, pp. 5998–6008. Curran Associates, Inc. (2017)
35.
Zurück zum Zitat Wei, H., Ni, B., Yan, Y., Yu, H., Yang, X., Yao, C.: Video summarization via semantic attended networks. In: Proceedings of the AAAI (2018) Wei, H., Ni, B., Yan, Y., Yu, H., Yang, X., Yao, C.: Video summarization via semantic attended networks. In: Proceedings of the AAAI (2018)
37.
Zurück zum Zitat Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: Proceedings of the ICML, pp. 2048–2057 (2015) Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: Proceedings of the ICML, pp. 2048–2057 (2015)
38.
Zurück zum Zitat Yao, L., et al.: Describing videos by exploiting temporal structure. In: Proceedings of the IEEE ICCV, pp. 4507–4515 (2015) Yao, L., et al.: Describing videos by exploiting temporal structure. In: Proceedings of the IEEE ICCV, pp. 4507–4515 (2015)
41.
Zurück zum Zitat Zhao, B., Li, X., Lu, X.: Hierarchical recurrent neural network for video summarization. In: Proceedings of the ACM Multimedia Conference, pp. 863–871 (2017) Zhao, B., Li, X., Lu, X.: Hierarchical recurrent neural network for video summarization. In: Proceedings of the ACM Multimedia Conference, pp. 863–871 (2017)
42.
Zurück zum Zitat Zhou, K., Qiao, Y., Xiang, T.: Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In: Proceedings of the AAAI (2018) Zhou, K., Qiao, Y., Xiang, T.: Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In: Proceedings of the AAAI (2018)
Metadaten
Titel
Summarizing Videos with Attention
verfasst von
Jiri Fajtl
Hajar Sadeghi Sokeh
Vasileios Argyriou
Dorothy Monekosso
Paolo Remagnino
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-21074-8_4

Premium Partner