nach oben

Erschienen in:

2020 | OriginalPaper | Buchkapitel

Image Captioning in Vietnamese Language Based on Deep Learning Network

verfasst von : Ha Nguyen Tien, Thanh-Ha Do, Van-Anh Nguyen

Erschienen in: Advances in Computational Collective Intelligence

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Image captioning is an underlying and crucial problem in artificial intelligence. This is challenging since it requires advanced research in computer vision to detect objects and the correlation of these objects in the image, and text-mining to convert these relationships into words. Although some based deep-learning and machine translation approaches have been achieved state-of-art results in English recently, it is missing an approach to generate the caption in Vietnamese, which is a local language in Vietnam with complex grammar and variable meaning in simple words. Moreover, machine translation is affected negatively by a significant issue called unknown words, which is caused by both large vocabulary size and unbalanced dataset. In this paper, we propose a new approach to generate the Vietnamese caption of the image and also a simple and effective solution to tackle unknown words problem of machine translation. In general, the results of these methods achieved in the self-build testing database are promising.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Enhance Trend Extraction Results by Refining with Additional Criteria

Nächstes Kapitel Textual Clustering: Towards a More Efficient Descriptors of Texts

https://github.com/karpathy/neuraltalk2.

http://www.mediafire.com/file/bmj0w3wdmyfwzf4/Captiontestset.rar/file.

http://cocodataset.org/#download.

https://github.com/Tienhavn/generalcorpus.

https://github.com/vncorenlp/VnCoreNLP.

Radon, J.: On the determination of functions from their integral values along certain manifolds. IEEE Trans. Med. Imaging 5(4), 170–176 (1986)CrossRef

Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24, 971–987 (2002)CrossRef

Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR 2005), San Diego, CA, USA, pp. 28–39, June 2005

Jaswal, D., Vishvanathan, S., Kp, S.: Image classification using convolutional neural networks. Int. J. Sci. Eng. Res. (2014)

Wei, X.-S., Xie, C.-W., Wu, J.: Mask-CNN: localizing parts and selecting descriptors for fine-grained image recognition. Pattern Recogn. 76, 704–714 (2018)CrossRef

He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. CoRR, vol. abs/1703.06870 (2017)

Al-muzaini, H., Al-Yahya, T.N., Benhidour, H.: Automatic Arabic image captioning using RNN-LSTM-based language model and CNN. Int. J. Adv. Comput. Sci. Appl. 9 (2018)

Mathews, A., Xie, L., He, X.: SemStyle: learning to generate stylised image captions using unaligned text. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR 2018), pp. 8591–8600 (2018)

Rahman, M., Mohammed, N., Mansoor, N., Momen, S.: Chittron: an automatic bangla image captioning system. Proc. Comput. Sci. 154, 636–642 (2018)CrossRef

10.

Gourisaria, H., Rama, S., Jayanthi, A., Gangey, T., Penikalapati, R.: Generating captions for underwater images using deep learning models. In: Conference on Artificial Intelligence: Research, Innovations and its Applications (2019)

11.

Setiawan, H., Li, H., Zhang, M., Ooi, B.C.: Phrase-based statistical machine translation: a level of detail approach. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 576–587. Springer, Heidelberg (2005). https://doi.org/10.1007/11562214_51CrossRef

12.

Bahdanauand, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Proceedings of the 3rd International Conference on Learning Representations (2015)

13.

Farhadi, A., et al.: Every picture tells a story: generating sentences from images. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 15–29. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_2CrossRef

14.

Fariha, A.: Automatic image captioning using multitask learning. In: Proceedings of the 29th Conference on Neural Information Processing Systems, Barcelona, Spain (2016)

15.

Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 677–691 (2017)CrossRef

16.

Mao, J., Xu, W., Yang, Y., Wang, J., Yuille, A.L.: Explain images with multimodal recurrent neural networks. CoRR, vol. abs/1410.1090 (2014)

17.

Fang, H., et al.: From captions to visual concepts and back. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR 2015), Washington, USA, pp. 1473–1482 (2015)

18.

Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. CoRR, vol. abs/1411.4555 (2014). http://arxiv.org/abs/1411.4555

19.

Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 1412–1421 (2015)

20.

Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. CoRR, vol. abs/1609.08144 (2016)

21.

Ngo, H.: Building an English-Vietnamese bilingual corpus for machine translation, pp. 157–160, November 2012

22.

Vu, T., Nguyen, D.Q., Nguyen, D.Q., Dras, M., Johnson, M.: VnCoreNLP: a Vietnamese natural language processing toolkit. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pp. 56–60. Association for Computational Linguistics, New Orleans, June 2018. https://www.aclweb.org/anthology/N18-5012

23.

Ward, T., Kishore, P., Roukos, S., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 311–318, July 2002

Titel: Image Captioning in Vietnamese Language Based on Deep Learning Network
verfasst von: Ha Nguyen Tien
Thanh-Ha Do
Van-Anh Nguyen
Verlag: Springer International Publishing
Buch: Advances in Computational Collective Intelligence
Print ISBN: 978-3-030-63118-5

Electronic ISBN: 978-3-030-63119-2

Copyright-Jahr: 2020
DOI: https://doi.org/10.1007/978-3-030-63119-2_64

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"