nach oben

Erschienen in:

2024 | OriginalPaper | Buchkapitel

Evaluating Image Similarity Using Contextual Information of Images with Pre-trained Models

verfasst von : Juyeon Kim, Sungwon Park, Byunghoon Park, B. Sooyeon Shin

Erschienen in: Mobile, Secure, and Programmable Networking

Verlag: Springer Nature Switzerland

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This study proposes an integrated approach to image similarity measurement by extending traditional methods that concentrate on local features to incorporate global information. Global information, including background, colors, spatial representation, and object relations, can leverage the ability to distinguish similarity based on the overall context of an image using natural process techniques. We employ Video-LLaMA model to extract textual descriptions of images through question prompts, and apply cosine similarity metrics, BERTScore, to quantify image similarities. We conduct experiments on images of the same and different topics using various pre-trained language model configurations. To validate the coherence of the generated text descriptions with the actual theme of the image, we generate images using DALL-E 2 and evaluate them using human judgement. Key findings demonstrate the effectiveness of pre-trained language models in distinguishing between images depicting similar and different topics with a clear gap in similarity.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Multiple Person Tracking Based on Gait Identification Using Kinect and OpenPose

Nächstes Kapitel AI vs. Dinosaurs – Automated Re-implementation of Legacy Mainframe Applications in Java by Combining Program Synthesis and GPT

Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y.: BERTScore: evaluating text generation with BERT (2019)

Huggingface blog. https://huggingface.co/blog/vision_language_pretraining. Accessed 8 Sept 2023

Kim, P.: Convolutional neural network. In: Kim, P. (ed.) MATLAB Deep Learning, pp. 121–147. Apress, Berkeley (2017). https://doi.org/10.1007/978-1-4842-2845-6_6CrossRef

Dosovitskiy, A., et al.: An image is worth \(16 \times 16\) words: transformers for image recognition at scale (2020)

Zhai, X., et al.: Scaling vision transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022)

Zhang, H., Li, X., Bing, L.: Video-llama: an instruction-tuned audio-visual language model for video understanding (2023)

Li, J., et al.: Blip-2: bootstrapping language-image pre-training with frozen image encoders and large language models (2023). https://doi.org/10.48550/arXiv.2301.12597

Devlin, J., et al. BERT: pre-training of deep bidirectional transformers for language understanding (2018)

Liu, N.F., et al.: Linguistic knowledge and transferability of contextual representations (2019). https://doi.org/10.18653/v1/N19-1112

10.

Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, pp. 311–318. Association for Computational Linguistics (2002). https://doi.org/10.3115/1073083.1073135

11.

Eddine, M.K., et al.: FrugalScore: learning cheaper, lighter, and faster evaluation metrics for automatic text generation. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2022)

12.

Wang, X., Zhu, Z.: Context understanding in computer vision: a survey. Comput. Vis. Image Understanding 229 (2023)

13.

Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach (2019)

14.

Lample, G., Conneau, A.: Cross-lingual language model pretraining (2019)

15.

Griffin, G., Holub, A.D., Perona, P.: Caltech 256 Image Dataset

16.

Huggingface docs, pre-trained models. https://huggingface.co/transformers/v3.4.0/pretrained_models.html

Titel: Evaluating Image Similarity Using Contextual Information of Images with Pre-trained Models
verfasst von: Juyeon Kim
Sungwon Park
Byunghoon Park
B. Sooyeon Shin
Verlag: Springer Nature Switzerland
Buch: Mobile, Secure, and Programmable Networking
Print ISBN: 978-3-031-52425-7

Electronic ISBN: 978-3-031-52426-4

Copyright-Jahr: 2024
DOI: https://doi.org/10.1007/978-3-031-52426-4_13

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner