Skip to main content

2018 | OriginalPaper | Buchkapitel

Attend to Knowledge: Memory-Enhanced Attention Network for Image Captioning

verfasst von : Hui Chen, Guiguang Ding, Zijia Lin, Yuchen Guo, Jungong Han

Erschienen in: Advances in Brain Inspired Cognitive Systems

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Image captioning, which aims to automatically generate sentences for images, has been exploited in many works. The attention-based methods have achieved impressive performance due to its superior ability of adapting the image’s feature to the context dynamically. Since the recurrent neural network has difficulties in remembering the information too far in the past, we argue that the attention model may not be adequately supervised by the guidance from the previous information at a distance. In this paper, we propose a memory-enhanced attention model for image captioning, aiming to improve the attention mechanism with previous learned knowledge. Specifically, we store the visual and semantic knowledge which has been exploited in the past into memories, and generate a global visual or semantic feature to improve the attention model. We verify the effectiveness of the proposed model on two prevalent benchmark datasets MS COCO and Flickr30k. The comparison with the state-of-the-art models demonstrates the superiority of the proposed model.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
2.
Zurück zum Zitat Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., Chua, T.S.: SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: CVPR (2017) Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., Chua, T.S.: SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: CVPR (2017)
3.
Zurück zum Zitat Chen, M., Ding, G., Zhao, S., Chen, H., Liu, Q., Han, J.: Reference based LSTM for image captioning. In: AAAI (2017) Chen, M., Ding, G., Zhao, S., Chen, H., Liu, Q., Han, J.: Reference based LSTM for image captioning. In: AAAI (2017)
4.
Zurück zum Zitat Ding, G., Guo, Y., Zhou, J., Gao, Y.: Large-scale cross-modality search via collective matrix factorization hashing. TIP 25(11), 5427–5440 (2016)MathSciNet Ding, G., Guo, Y., Zhou, J., Gao, Y.: Large-scale cross-modality search via collective matrix factorization hashing. TIP 25(11), 5427–5440 (2016)MathSciNet
6.
Zurück zum Zitat Fakoor, R., Mohamed, A.R., Mitchell, M., Kang, S.B., Kohli, P.: Memory-augmented attention modelling for videos. arXiv preprint arXiv:1611.02261 (2016) Fakoor, R., Mohamed, A.R., Mitchell, M., Kang, S.B., Kohli, P.: Memory-augmented attention modelling for videos. arXiv preprint arXiv:​1611.​02261 (2016)
7.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
8.
Zurück zum Zitat Jia, X., Gavves, E., Fernando, B., Tuytelaars, T.: Guiding the long-short term memory model for image caption generation. In: IEEE International Conference on Computer Vision, pp. 2407–2415 (2015) Jia, X., Gavves, E., Fernando, B., Tuytelaars, T.: Guiding the long-short term memory model for image caption generation. In: IEEE International Conference on Computer Vision, pp. 2407–2415 (2015)
9.
Zurück zum Zitat Jin, J., Fu, K., Cui, R., Sha, F., Zhang, C.: Aligning where to see and what to tell: image caption with region-based attention and scene factorization. arXiv preprint arXiv:1506.06272 (2015) Jin, J., Fu, K., Cui, R., Sha, F., Zhang, C.: Aligning where to see and what to tell: image caption with region-based attention and scene factorization. arXiv preprint arXiv:​1506.​06272 (2015)
10.
Zurück zum Zitat Kaiser, L., Nachum, O., Roy, A., Bengio, S.: Learning to remember rare events. In: CVPR (2017) Kaiser, L., Nachum, O., Roy, A., Bengio, S.: Learning to remember rare events. In: CVPR (2017)
11.
Zurück zum Zitat Kumar, A., et al.: Ask me anything: dynamic memory networks for natural language processing. In: International Conference on Machine Learning, pp. 1378–1387 (2016) Kumar, A., et al.: Ask me anything: dynamic memory networks for natural language processing. In: International Conference on Machine Learning, pp. 1378–1387 (2016)
12.
Zurück zum Zitat Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: European Conference on Computer Vision, pp. 740–755 (2014) Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: European Conference on Computer Vision, pp. 740–755 (2014)
13.
Zurück zum Zitat Lin, Z., Ding, G., Han, J., Wang, J.: Cross-view retrieval via probability-based semantics-preserving hashing. IEEE Trans. Cybern. (2016) Lin, Z., Ding, G., Han, J., Wang, J.: Cross-view retrieval via probability-based semantics-preserving hashing. IEEE Trans. Cybern. (2016)
14.
Zurück zum Zitat Lu, J., Xiong, C., Parikh, D., Socher, R.: Knowing when to look: adaptive attention via a visual sentinel for image captioning (2017) Lu, J., Xiong, C., Parikh, D., Socher, R.: Knowing when to look: adaptive attention via a visual sentinel for image captioning (2017)
15.
Zurück zum Zitat Roopnarine, J., Johnson, J.E.: Approaches to early childhood education. Merrill/Prentice Hall, Upper Saddle River (2013) Roopnarine, J., Johnson, J.E.: Approaches to early childhood education. Merrill/Prentice Hall, Upper Saddle River (2013)
16.
Zurück zum Zitat Sukhbaatar, S., Weston, J., Fergus, R., et al.: End-to-end memory networks. In: Advances in Neural Information Processing Systems, pp. 2440–2448 (2015) Sukhbaatar, S., Weston, J., Fergus, R., et al.: End-to-end memory networks. In: Advances in Neural Information Processing Systems, pp. 2440–2448 (2015)
17.
Zurück zum Zitat Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: CVPR, pp. 3156–3164 (2015) Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: CVPR, pp. 3156–3164 (2015)
19.
Zurück zum Zitat Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: ICML, pp. 2048–2057 (2015) Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: ICML, pp. 2048–2057 (2015)
20.
Zurück zum Zitat Yang, Z., Yuan, Y., Wu, Y., Salakhutdinov, R., Cohen, W.W.: Encode, review, and decode: reviewer module for caption generation. In: NIPS (2016) Yang, Z., Yuan, Y., Wu, Y., Salakhutdinov, R., Cohen, W.W.: Encode, review, and decode: reviewer module for caption generation. In: NIPS (2016)
21.
22.
Zurück zum Zitat You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4651–4659 (2016) You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4651–4659 (2016)
23.
Zurück zum Zitat Young, P., Lai, A., Hodosh, M., Hockenmaier, J.: From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Linguist. 2, 67–78 (2014) Young, P., Lai, A., Hodosh, M., Hockenmaier, J.: From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Linguist. 2, 67–78 (2014)
Metadaten
Titel
Attend to Knowledge: Memory-Enhanced Attention Network for Image Captioning
verfasst von
Hui Chen
Guiguang Ding
Zijia Lin
Yuchen Guo
Jungong Han
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-00563-4_16

Premium Partner