Skip to main content

2020 | OriginalPaper | Buchkapitel

Image Captioning in Vietnamese Language Based on Deep Learning Network

verfasst von : Ha Nguyen Tien, Thanh-Ha Do, Van-Anh Nguyen

Erschienen in: Advances in Computational Collective Intelligence

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Image captioning is an underlying and crucial problem in artificial intelligence. This is challenging since it requires advanced research in computer vision to detect objects and the correlation of these objects in the image, and text-mining to convert these relationships into words. Although some based deep-learning and machine translation approaches have been achieved state-of-art results in English recently, it is missing an approach to generate the caption in Vietnamese, which is a local language in Vietnam with complex grammar and variable meaning in simple words. Moreover, machine translation is affected negatively by a significant issue called unknown words, which is caused by both large vocabulary size and unbalanced dataset. In this paper, we propose a new approach to generate the Vietnamese caption of the image and also a simple and effective solution to tackle unknown words problem of machine translation. In general, the results of these methods achieved in the self-build testing database are promising.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Radon, J.: On the determination of functions from their integral values along certain manifolds. IEEE Trans. Med. Imaging 5(4), 170–176 (1986)CrossRef Radon, J.: On the determination of functions from their integral values along certain manifolds. IEEE Trans. Med. Imaging 5(4), 170–176 (1986)CrossRef
2.
Zurück zum Zitat Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24, 971–987 (2002)CrossRef Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24, 971–987 (2002)CrossRef
3.
Zurück zum Zitat Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR 2005), San Diego, CA, USA, pp. 28–39, June 2005 Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR 2005), San Diego, CA, USA, pp. 28–39, June 2005
4.
Zurück zum Zitat Jaswal, D., Vishvanathan, S., Kp, S.: Image classification using convolutional neural networks. Int. J. Sci. Eng. Res. (2014) Jaswal, D., Vishvanathan, S., Kp, S.: Image classification using convolutional neural networks. Int. J. Sci. Eng. Res. (2014)
5.
Zurück zum Zitat Wei, X.-S., Xie, C.-W., Wu, J.: Mask-CNN: localizing parts and selecting descriptors for fine-grained image recognition. Pattern Recogn. 76, 704–714 (2018)CrossRef Wei, X.-S., Xie, C.-W., Wu, J.: Mask-CNN: localizing parts and selecting descriptors for fine-grained image recognition. Pattern Recogn. 76, 704–714 (2018)CrossRef
6.
Zurück zum Zitat He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. CoRR, vol. abs/1703.06870 (2017) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. CoRR, vol. abs/1703.06870 (2017)
7.
Zurück zum Zitat Al-muzaini, H., Al-Yahya, T.N., Benhidour, H.: Automatic Arabic image captioning using RNN-LSTM-based language model and CNN. Int. J. Adv. Comput. Sci. Appl. 9 (2018) Al-muzaini, H., Al-Yahya, T.N., Benhidour, H.: Automatic Arabic image captioning using RNN-LSTM-based language model and CNN. Int. J. Adv. Comput. Sci. Appl. 9 (2018)
8.
Zurück zum Zitat Mathews, A., Xie, L., He, X.: SemStyle: learning to generate stylised image captions using unaligned text. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR 2018), pp. 8591–8600 (2018) Mathews, A., Xie, L., He, X.: SemStyle: learning to generate stylised image captions using unaligned text. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR 2018), pp. 8591–8600 (2018)
9.
Zurück zum Zitat Rahman, M., Mohammed, N., Mansoor, N., Momen, S.: Chittron: an automatic bangla image captioning system. Proc. Comput. Sci. 154, 636–642 (2018)CrossRef Rahman, M., Mohammed, N., Mansoor, N., Momen, S.: Chittron: an automatic bangla image captioning system. Proc. Comput. Sci. 154, 636–642 (2018)CrossRef
10.
Zurück zum Zitat Gourisaria, H., Rama, S., Jayanthi, A., Gangey, T., Penikalapati, R.: Generating captions for underwater images using deep learning models. In: Conference on Artificial Intelligence: Research, Innovations and its Applications (2019) Gourisaria, H., Rama, S., Jayanthi, A., Gangey, T., Penikalapati, R.: Generating captions for underwater images using deep learning models. In: Conference on Artificial Intelligence: Research, Innovations and its Applications (2019)
11.
Zurück zum Zitat Setiawan, H., Li, H., Zhang, M., Ooi, B.C.: Phrase-based statistical machine translation: a level of detail approach. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 576–587. Springer, Heidelberg (2005). https://doi.org/10.1007/11562214_51CrossRef Setiawan, H., Li, H., Zhang, M., Ooi, B.C.: Phrase-based statistical machine translation: a level of detail approach. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 576–587. Springer, Heidelberg (2005). https://​doi.​org/​10.​1007/​11562214_​51CrossRef
12.
Zurück zum Zitat Bahdanauand, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Proceedings of the 3rd International Conference on Learning Representations (2015) Bahdanauand, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Proceedings of the 3rd International Conference on Learning Representations (2015)
14.
Zurück zum Zitat Fariha, A.: Automatic image captioning using multitask learning. In: Proceedings of the 29th Conference on Neural Information Processing Systems, Barcelona, Spain (2016) Fariha, A.: Automatic image captioning using multitask learning. In: Proceedings of the 29th Conference on Neural Information Processing Systems, Barcelona, Spain (2016)
15.
Zurück zum Zitat Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 677–691 (2017)CrossRef Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 677–691 (2017)CrossRef
16.
Zurück zum Zitat Mao, J., Xu, W., Yang, Y., Wang, J., Yuille, A.L.: Explain images with multimodal recurrent neural networks. CoRR, vol. abs/1410.1090 (2014) Mao, J., Xu, W., Yang, Y., Wang, J., Yuille, A.L.: Explain images with multimodal recurrent neural networks. CoRR, vol. abs/1410.1090 (2014)
17.
Zurück zum Zitat Fang, H., et al.: From captions to visual concepts and back. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR 2015), Washington, USA, pp. 1473–1482 (2015) Fang, H., et al.: From captions to visual concepts and back. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR 2015), Washington, USA, pp. 1473–1482 (2015)
19.
Zurück zum Zitat Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 1412–1421 (2015) Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 1412–1421 (2015)
20.
Zurück zum Zitat Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. CoRR, vol. abs/1609.08144 (2016) Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. CoRR, vol. abs/1609.08144 (2016)
21.
Zurück zum Zitat Ngo, H.: Building an English-Vietnamese bilingual corpus for machine translation, pp. 157–160, November 2012 Ngo, H.: Building an English-Vietnamese bilingual corpus for machine translation, pp. 157–160, November 2012
22.
Zurück zum Zitat Vu, T., Nguyen, D.Q., Nguyen, D.Q., Dras, M., Johnson, M.: VnCoreNLP: a Vietnamese natural language processing toolkit. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pp. 56–60. Association for Computational Linguistics, New Orleans, June 2018. https://www.aclweb.org/anthology/N18-5012 Vu, T., Nguyen, D.Q., Nguyen, D.Q., Dras, M., Johnson, M.: VnCoreNLP: a Vietnamese natural language processing toolkit. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pp. 56–60. Association for Computational Linguistics, New Orleans, June 2018. https://​www.​aclweb.​org/​anthology/​N18-5012
23.
Zurück zum Zitat Ward, T., Kishore, P., Roukos, S., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 311–318, July 2002 Ward, T., Kishore, P., Roukos, S., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 311–318, July 2002
Metadaten
Titel
Image Captioning in Vietnamese Language Based on Deep Learning Network
verfasst von
Ha Nguyen Tien
Thanh-Ha Do
Van-Anh Nguyen
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-63119-2_64