nach oben

Erschienen in:

2020 | OriginalPaper | Buchkapitel

UIT-ViIC: A Dataset for the First Evaluation on Vietnamese Image Captioning

verfasst von : Quan Hoang Lam, Quang Duy Le, Van Kiet Nguyen, Ngan Luu-Thuy Nguyen

Erschienen in: Computational Collective Intelligence

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Image Captioning (IC), the task of automatic generation of image captions, has attracted attentions from researchers in many fields of computer science, being computer vision, natural language processing and machine learning in recent years. This paper contributes to research on Image Captioning task in terms of extending dataset to a different language - Vietnamese. So far, there has been no existed Image Captioning dataset for Vietnamese language, so this is the foremost fundamental step for developing Vietnamese Image Captioning. In this scope, we first built a dataset which contains manually written captions for images from Microsoft COCO dataset relating to sports played with balls, we called this dataset UIT-ViIC (University Of Information Technology - Vietnamese Image Captions). UIT-ViIC consists of 19,250 Vietnamese captions for 3,850 images. Following that, we evaluated our dataset on deep neural network models and did comparisons with English dataset and two Vietnamese datasets built by different methods. UIT-ViIC is published on our lab website (https://sites.google.com/uit.edu.vn/uit-nlp/) for research purposes.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Approach to Extract Keywords and Keyphrases of Text Resources and Documents in the Kazakh Language

Nächstes Kapitel Detection of Extremist Ideation on Social Media Using Machine Learning Techniques

https://italianinnovationday.weebly.com/horus-technology.html.

Chen, X., et al.: Microsoft coco captions: Data collection and evaluation server (2015). arXiv preprint arXiv:1504.00325

Choi, Y.: Yunjey/Pytorch-Tutorial (2018). https://github.com/yunjey/pytorch-tutorial/tree/master/tutorials/03-advanced/image_captioning. Accessed 1 Feb 2020

Funaki, R., Nakayama, H.: Image-mediated learning for zero-shot cross-lingual document retrieval. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (2015). http://dx.doi.org/10.18653/v1/d15-1070

He, K. et al.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016). https://doi.org/10.1109/cvpr.2016.90

Hodosh, M., Young, P., Hockenmaier, J.: Framing image description as a ranking task: data, models and evaluation metrics. J. Artif. Intelli. Res. 47, 853–899 (2013). https://doi.org/10.1613/jair.3994

Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015). https://doi.org/10.1109/cvpr.2015.7298932

Li, X., et al.: COCO-CN for cross-lingual image tagging, captioning, and retrieval. IEEE Trans. Multimedia 21(9), 2347–2360 (2019). https://doi.org/10.1109/tmm.2019.2896494CrossRef

Lin, C.-Y.: ROUGE: a package for automatic evaluation of summaries. In: Proceedings of Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL 2004, Barcelona, Spain (2004)

Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48CrossRef

10.

Miyazaki, T., Shimizu, N.: Cross-lingual image caption generation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2016). http://dx.doi.org/10.18653/v1/p16-1168

11.

Papineni, K., et al.: BLEU. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics - ACL 2002 (2001). https://doi.org/10.3115/1073083.1073135

12.

Rashtchian, C., Young, P., Hodosh, M., Hockenmaier, J.: Collecting image annotations using Amazon’s mechanical turk. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pp. 139–147. Association for Computational Linguistics, June 2010

13.

Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-yMathSciNetCrossRef

14.

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556

15.

Tran Viet, T.: Trungtv/Pyvi (2019). https://github.com/trungtv/pyvi. Accessed 1 Feb 2020

16.

Vedantam, R., Zitnick, C.L., Parikh, D.: CIDEr: consensus-based image description evaluation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015). https://doi.org/10.1109/cvpr.2015.7299087

17.

Vinyals, O., et al.: Show and tell: a neural image caption generator. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015). https://doi.org/10.1109/cvpr.2015.7298935

18.

Yoshikawa, Y., Shigeto, Y., Takeuchi, A.: STAIR captions: constructing a large-scale Japanese image caption dataset. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)(2017). https://doi.org/10.18653/v1/p17-2066

19.

Young, P., et al.: From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Linguist. 2, 67–78 (2014). https://doi.org/10.1162/tacl_a_00166CrossRef

Titel: UIT-ViIC: A Dataset for the First Evaluation on Vietnamese Image Captioning
verfasst von: Quan Hoang Lam
Quang Duy Le
Van Kiet Nguyen
Ngan Luu-Thuy Nguyen
Verlag: Springer International Publishing
Buch: Computational Collective Intelligence
Print ISBN: 978-3-030-63006-5

Electronic ISBN: 978-3-030-63007-2

Copyright-Jahr: 2020
DOI: https://doi.org/10.1007/978-3-030-63007-2_57

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"