nach oben

Erschienen in:

01.10.2020 | MATHEMATICAL THEORY OF IMAGES AND SIGNALS REPRESENTING, PROCESSING, ANALYSIS, RECOGNITION, AND UNDERSTANDING

Image Captioning using Reinforcement Learning with BLUDEr Optimization

verfasst von: P. R. Devi, V. Thrivikraman, D. Kashyap, S. S. Shylaja

Erschienen in: Pattern Recognition and Image Analysis | Ausgabe 4/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Image captioning is a growing field of research that has taken hold of the research community. It is a challenging task owing to the complexity of natural language generation and the difficulty involved in feature extraction from a diverse collection of images. Many models have been proposed to tackle the problem, like state-of-the-art encoder-decoder (Sequential CNN-RNN) systems that have proved to be capable of obtaining results. Recently, Reinforcement learning has made itself the new approach to the problem and has been successful in surpassing many of the state-of-the-art paradigms. We have come up with a new reward system known as the BLUDEr metric, which is a linear combination of the non-differentiable metrics BLEU and CIDEr. We directly optimize this metric for our model, on natural language generation tasks. In our experiments, we use the Flickr30k and Flickr8k datasets, which have become two of the benchmark datasets when it comes to image captioning systems. We have achieved state-of-the-art results on these two datasets, when compared with other models.

Vorheriger Artikel Computational Approaches to Aesthetic Quality Assessment of Digital Photographs: State of the Art and Future Research Directives

Nächster Artikel An Overview on Nature-Inspired Optimization Algorithms and Their Possible Application in Image Processing Domain

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common objects in context,” in European Conference on Computer Vision (2014), pp. 740–755.

B. A. Plummer, L. Wang, C. M. Cervantes, J. C. Caicedo, J. Hockenmaier, and S. Lazebnik, “Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models,” in Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 2641–2649.

M. Hodosh, P. Young, and J. Hockenmaier, “Framing image description as a ranking task: Data, models and evaluation metrics,” J. Artif. Intell. Res. 47, 853–899 (2013).MathSciNetCrossRef

K. Papineni, S. Roukos, T. Ward, and Wei-Jing Zhu, “BLEU: A method for automatic evaluation of machine translation,” in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (2002), pp. 311–318.

R. Vedantam, C. L. Zitnick, and D. Parikh, “Cider: Consensus-based image description evaluation,” in CVPR 2015 (2014).

S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer, “Scheduled sampling for sequence prediction with recurrent neural networks,” in NIPS'15: Proceedings of the 28th International Conference on Neural Information Processing Systems (2015), Vol. 1, pp. 1171–1179.

A. N. Burnetas and M. N. Katehakis, “Optimal adaptive policies for Markov decision processes,” Math. Oper. Res. 22 (1), 222–255 (1997).MathSciNetCrossRef

S. J. Rennie, E. Marcheret, Y. Mroueh, J. Ross, and V. Goel, “Self-critical sequence training for image captioning” (2016). arXiv:1612.00563.

S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput. 9 (8), 1735–1780 (1997).CrossRef

10.

O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, “Show and tell: A neural image caption generator,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 3156–3164.

11.

Qi Wu, Ch. Shen, P. Wang, A. Dick, and A. van den Hengel, “Image captioning and visual question answering based on attributes and external knowledge,” IEEE Trans. Pattern Anal. Mach. Intell. 40 (6), 1367–1381 (2018).CrossRef

12.

Haichao Shi, Peng Li, Bo Wang, and Zhenyu Wang, “Image captioning based on deep reinforcement learning,” in Proceedings of the 10th International Conference on Internet Multimedia Computing and Service (ICIMCS) (Nanjing, 2018), pp. 45–49.

13.

K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. Bengio, “Show, attend and tell: Neural image caption generation with visual attention,” in International Conference on Machine Learning (2015), pp. 2048–2057.

14.

A. Karpathy and Li Fei-Fei, “Deep visual semantic alignments for generating image descriptions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 3128–3137.

15.

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 770–778.

16.

X. Jia, E. Gavves, B. Fernando, and T. Tuytelaars, “Guiding the long-short term memory model for image caption generation,” in Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 2407–2415.

17.

Long Chen, Hanwang Zhang, Jun Xiao, Liqiang Nie, Jian Shao, and Tat-Seng Chua, “SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), pp. 6298–6306.

18.

Junqi Jin, Kun Fu, Runpeng Cui, Fei Sha, and Changshui Zhang, “Aligning where to see and what to tell: Image caption with region-based attention and scene factorization” (2015). arXiv:1506.06272.

19.

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in ICLR (2015).

20.

P. Anderson, B. Fernando, M. Johnson, and S. Gould, “Spice: Semantic propositional image caption evaluation,” in European Conference on Computer Vision (2016), pp. 382–398.

21.

M. J. Kusner, Yu Sun, N. I. Kolkin, and K. Q. Weinberger, “From word embeddings to document distances,” in ICML (2015), pp. 957–966.

22.

Chin-Yew Lin, “Rouge: A package for automatic evaluation of summaries,” in Text Summarization Branches out, Proceedings of the ACL-04 Workshop (Barcelona, 2004), Vol. 8.

23.

S. Banerjee and A. Lavie, “METEOR: An automatic metric for MT evaluation with improved correlation with human judgments,” in Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization (2005), Vol. 29, pp. 65–72.

Titel: Image Captioning using Reinforcement Learning with BLUDEr Optimization
verfasst von: P. R. Devi
V. Thrivikraman
D. Kashyap
S. S. Shylaja
Publikationsdatum: 01.10.2020
Verlag: Pleiades Publishing
Erschienen in: Pattern Recognition and Image Analysis / Ausgabe 4/2020
Print ISSN: 1054-6618
Elektronische ISSN: 1555-6212
DOI: https://doi.org/10.1134/S1054661820040094

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 4/2020

Automatic Skin Lesion Segmentation—A Novel Approach of Lesion Filling through Pixel Path

Person Tracking and Reidentification for Multicamera Indoor Video Surveillance Systems

Foundations of Creation of a Complex of Applied Intelligent Systems for Diagnostics of Psychological Safety and Cognitive Sphere of Patients with a Neurological Pathology

Local Tetra-Directional Pattern–A New Texture Descriptor for Content-Based Image Retrieval

Survey of Learning Based Single Image Super-Resolution Reconstruction Technology

Computational Approaches to Aesthetic Quality Assessment of Digital Photographs: State of the Art and Future Research Directives