Skip to main content
Top
Published in: Neural Processing Letters 2/2021

17-02-2021

aiTPR: Attribute Interaction-Tensor Product Representation for Image Caption

Author: Chiranjib Sur

Published in: Neural Processing Letters | Issue 2/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Region visual features enhance the generative capability of the machines based on features. However, they lack proper interaction-based attentional perceptions and end up with biased or uncorrelated sentences or pieces of misinformation. In this work, we propose Attribute Interaction-Tensor Product Representation (aiTPR), which is a convenient way of gathering more information through orthogonal combination and learning the interactions as physical entities (tensors) and improving the captions. Compared to previous works, where features add up to undefined feature spaces, TPR helps maintain sanity in combinations, and orthogonality helps define familiar spaces. We have introduced a new concept layer that defines the objects and their interactions that can play a crucial role in determining different descriptions. The interaction portions have contributed heavily to better caption quality and have out-performed various previous works on this domain and MSCOCO dataset. For the first time, we introduced the notion of combining regional image features and abstracted interaction likelihood embedding for image captioning.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Ren S, He K, Girshick R, Sun J (2015). Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99 Ren S, He K, Girshick R, Sun J (2015). Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
3.
go back to reference Sur C (2019) Representations for vision language intelligence using tensor product representation. Ph.D. dissertation, University of Florida Sur C (2019) Representations for vision language intelligence using tensor product representation. Ph.D. dissertation, University of Florida
4.
go back to reference Sur C (2019) Survey of deep learning and architectures for visual captioning-transitioning between media and natural languages. Multimedia Tools Appl 78:1–51CrossRef Sur C (2019) Survey of deep learning and architectures for visual captioning-transitioning between media and natural languages. Multimedia Tools Appl 78:1–51CrossRef
5.
go back to reference Sur C (2019) GSIAR: gene-subcategory interaction-based improved deep representation learning for breast cancer subcategorical analysis using gene expression, applicable for precision medicine. Med Biol Eng Comput 57(11):2483–2515CrossRef Sur C (2019) GSIAR: gene-subcategory interaction-based improved deep representation learning for breast cancer subcategorical analysis using gene expression, applicable for precision medicine. Med Biol Eng Comput 57(11):2483–2515CrossRef
6.
go back to reference Karpathy A, Fei-Fei L (2015) Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE conference on computer vision and pattern recognition Karpathy A, Fei-Fei L (2015) Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE conference on computer vision and pattern recognition
8.
go back to reference Vinyals O et al (2015) Show and tell: a neural image caption generator. In: Proceedings of the IEEE conference on computer vision and pattern recognition Vinyals O et al (2015) Show and tell: a neural image caption generator. In: Proceedings of the IEEE conference on computer vision and pattern recognition
9.
go back to reference Chen X, Lawrence Zitnick C (2015) Mind’s eye: a recurrent visual representation for image caption generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition Chen X, Lawrence Zitnick C (2015) Mind’s eye: a recurrent visual representation for image caption generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition
10.
go back to reference Devlin, J, Gupta, S, Girshick, R, Mitchell, M, Zitnick, C. L. (2015). Exploring nearest neighbor approaches for image captioning. arXiv preprint arXiv:1505.04467 Devlin, J, Gupta, S, Girshick, R, Mitchell, M, Zitnick, C. L. (2015). Exploring nearest neighbor approaches for image captioning. arXiv preprint arXiv:​1505.​04467
11.
go back to reference Chiranjib S (2019) A Multi-Modular System-Genetics (MMSG) approach for deep representation learning for personalized treatment of cancer using sensitivity analysis of precision drugs and gene expression data. Data-Enabled Discov Appl 3(1):11CrossRef Chiranjib S (2019) A Multi-Modular System-Genetics (MMSG) approach for deep representation learning for personalized treatment of cancer using sensitivity analysis of precision drugs and gene expression data. Data-Enabled Discov Appl 3(1):11CrossRef
12.
go back to reference Xu K et al (2015) Show, attend and tell: neural image caption generation with visual attention. In: International conference on machine learning Xu K et al (2015) Show, attend and tell: neural image caption generation with visual attention. In: International conference on machine learning
14.
go back to reference Yao T, Pan Y, Li Y, Qiu Z, Mei T (2017) Boosting image captioning with attributes. In: IEEE international conference on computer vision, ICCV, pp 22–29 Yao T, Pan Y, Li Y, Qiu Z, Mei T (2017) Boosting image captioning with attributes. In: IEEE international conference on computer vision, ICCV, pp 22–29
15.
go back to reference Rennie SJ, Marcheret E, Mroueh Y, Ross J, Goel V (2017) Self-critical sequence training for image captioning. In: CVPR, vol 1, no 2, p 3 Rennie SJ, Marcheret E, Mroueh Y, Ross J, Goel V (2017) Self-critical sequence training for image captioning. In: CVPR, vol 1, no 2, p 3
16.
go back to reference Chen H, Ding G, Lin Z, Zhao S, Han J (2018) Show, observe and tell: attribute-driven attention model for image captioning. In: IJCAI, pp 606–612 Chen H, Ding G, Lin Z, Zhao S, Han J (2018) Show, observe and tell: attribute-driven attention model for image captioning. In: IJCAI, pp 606–612
17.
go back to reference Yao T, Pan Y, Li Y, Mei T (2018) Exploring visual relationship for image captioning. In: Proceedings of the European conference on computer vision (ECCV), pp 684–699 Yao T, Pan Y, Li Y, Mei T (2018) Exploring visual relationship for image captioning. In: Proceedings of the European conference on computer vision (ECCV), pp 684–699
18.
go back to reference Paul S (1990) Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artif Intell 46(1–2):159–216MathSciNetMATH Paul S (1990) Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artif Intell 46(1–2):159–216MathSciNetMATH
20.
go back to reference Lu J, Yang J, Batra D, Parikh D (2018) Neural baby talk. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7219–7228 Lu J, Yang J, Batra D, Parikh D (2018) Neural baby talk. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7219–7228
21.
go back to reference You Q, Jin H, Luo J (2018) Image captioning at will: a versatile scheme for effectively injecting sentiments into image descriptions. arXiv preprint arXiv:1801.10121 You Q, Jin H, Luo J (2018) Image captioning at will: a versatile scheme for effectively injecting sentiments into image descriptions. arXiv preprint arXiv:​1801.​10121
22.
go back to reference Melnyk I, Sercu T, Dognin PL, Ross J, Mroueh Y (2018) Improved image captioning with adversarial semantic alignment. arXiv preprint arXiv:1805.00063 Melnyk I, Sercu T, Dognin PL, Ross J, Mroueh Y (2018) Improved image captioning with adversarial semantic alignment. arXiv preprint arXiv:​1805.​00063
24.
go back to reference Chen F, Ji R, Su J, Wu Y, Wu Y (2017) Structcap: structured semantic embedding for image captioning. In: Proceedings of the 2017 ACM on multimedia conference. ACM, pp 46–54 Chen F, Ji R, Su J, Wu Y, Wu Y (2017) Structcap: structured semantic embedding for image captioning. In: Proceedings of the 2017 ACM on multimedia conference. ACM, pp 46–54
25.
26.
go back to reference Wu C, Wei Y, Chu X, Su F, Wang L (2018) Modeling visual and word-conditional semantic attention for image captioning. Signal Process: Image Commun 67:100–107 Wu C, Wei Y, Chu X, Su F, Wang L (2018) Modeling visual and word-conditional semantic attention for image captioning. Signal Process: Image Commun 67:100–107
27.
go back to reference Fu K, Li J, Jin J, Zhang C (2018) Image-text surgery: efficient concept learning in image captioning by generating pseudopairs. IEEE Trans Neural Netw Learn Syst 99:1–12 Fu K, Li J, Jin J, Zhang C (2018) Image-text surgery: efficient concept learning in image captioning by generating pseudopairs. IEEE Trans Neural Netw Learn Syst 99:1–12
28.
go back to reference Cornia M, Baraldi L, Serra G, Cucchiara R (2018) Paying more attention to saliency: image captioning with saliency and context attention. ACM Trans Multimedia Comput Commun Appl (TOMM) 14(2):48 Cornia M, Baraldi L, Serra G, Cucchiara R (2018) Paying more attention to saliency: image captioning with saliency and context attention. ACM Trans Multimedia Comput Commun Appl (TOMM) 14(2):48
29.
go back to reference Zhao W, Wang B, Ye J, Yang M, Zhao Z, Luo R, Qiao Y (2018) A multi-task learning approach for image captioning. In: IJCAI, pp 1205–1211 Zhao W, Wang B, Ye J, Yang M, Zhao Z, Luo R, Qiao Y (2018) A multi-task learning approach for image captioning. In: IJCAI, pp 1205–1211
30.
go back to reference Li X, Wang X, Xu C, Lan W, Wei Q, Yang G, Xu J (2018) COCO-CN for cross-lingual image tagging, captioning and retrieval. arXiv preprint arXiv:1805.08661 Li X, Wang X, Xu C, Lan W, Wei Q, Yang G, Xu J (2018) COCO-CN for cross-lingual image tagging, captioning and retrieval. arXiv preprint arXiv:​1805.​08661
31.
go back to reference Sur C, Liu P, Zhou Y, Wu D (2019) Semantic tensor product for image captioning. In: 2019 5th international conference on big data computing and communications (BIGCOM). IEEE, pp 33–37 Sur C, Liu P, Zhou Y, Wu D (2019) Semantic tensor product for image captioning. In: 2019 5th international conference on big data computing and communications (BIGCOM). IEEE, pp 33–37
32.
go back to reference Chen M, Ding G, Zhao S, Chen H, Liu Q, Han J (2017) Reference based LSTM for image captioning. In: AAAI, pp 3981–3987 Chen M, Ding G, Zhao S, Chen H, Liu Q, Han J (2017) Reference based LSTM for image captioning. In: AAAI, pp 3981–3987
33.
go back to reference Chen H, Zhang H, Chen PY, Yi J, Hsieh CJ (2017) Show-and-fool: crafting adversarial examples for neural image captioning. arXiv preprint arXiv:1712.02051 Chen H, Zhang H, Chen PY, Yi J, Hsieh CJ (2017) Show-and-fool: crafting adversarial examples for neural image captioning. arXiv preprint arXiv:​1712.​02051
35.
go back to reference Ye S, Liu N, Han J (2018) Attentive linear transformation for image captioning. IEEE Trans Image Process 27:5514–5524MathSciNetCrossRef Ye S, Liu N, Han J (2018) Attentive linear transformation for image captioning. IEEE Trans Image Process 27:5514–5524MathSciNetCrossRef
36.
go back to reference Wang Y, Lin Z, Shen X, Cohen S, Cottrell GW (2017) Skeleton key: image captioning by skeleton-attribute decomposition. arXiv preprint arXiv:1704.06972 Wang Y, Lin Z, Shen X, Cohen S, Cottrell GW (2017) Skeleton key: image captioning by skeleton-attribute decomposition. arXiv preprint arXiv:​1704.​06972
37.
go back to reference Chen T, Zhang Z, You Q, Fang C, Wang Z, Jin H, Luo J (2018) “Factual” or “Emotional”: stylized image captioning with adaptive learning and attention. arXiv preprint arXiv:1807.03871 Chen T, Zhang Z, You Q, Fang C, Wang Z, Jin H, Luo J (2018) “Factual” or “Emotional”: stylized image captioning with adaptive learning and attention. arXiv preprint arXiv:​1807.​03871
38.
go back to reference Chen F, Ji R, Sun X, Wu Y, Su J (2018) GroupCap: group-based image captioning with structured relevance and diversity constraints. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1345–1353 Chen F, Ji R, Sun X, Wu Y, Su J (2018) GroupCap: group-based image captioning with structured relevance and diversity constraints. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1345–1353
39.
go back to reference Liu C, Sun F, Wang C, Wang F, Yuille A (2017) MAT: a multimodal attentive translator for image captioning. arXiv preprint arXiv:1702.05658 Liu C, Sun F, Wang C, Wang F, Yuille A (2017) MAT: a multimodal attentive translator for image captioning. arXiv preprint arXiv:​1702.​05658
40.
go back to reference Harzig P, Brehm S, Lienhart R, Kaiser C, Schallner R (2018) Multimodal image captioning for marketing analysis. arXiv preprint arXiv:1802.01958 Harzig P, Brehm S, Lienhart R, Kaiser C, Schallner R (2018) Multimodal image captioning for marketing analysis. arXiv preprint arXiv:​1802.​01958
41.
go back to reference Liu X, Li H, Shao J, Chen D, Wang X (2018) Show, tell and discriminate: image captioning by self-retrieval with partially labeled data. arXiv preprint arXiv:1803.08314 Liu X, Li H, Shao J, Chen D, Wang X (2018) Show, tell and discriminate: image captioning by self-retrieval with partially labeled data. arXiv preprint arXiv:​1803.​08314
42.
go back to reference Chunseong Park C, Kim B, Kim G (2017) Attend to you: personalized image captioning with context sequence memory networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 895–903 Chunseong Park C, Kim B, Kim G (2017) Attend to you: personalized image captioning with context sequence memory networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 895–903
43.
go back to reference Sharma P, Ding N, Goodman S, Soricut R (2018) Conceptual captions: a cleaned, Hypernymed, image alt-text dataset for automatic image captioning. In: Proceedings of the 56th annual meeting of the association for computational linguistics, vol 1. Long Papers, pp 2556–2565 Sharma P, Ding N, Goodman S, Soricut R (2018) Conceptual captions: a cleaned, Hypernymed, image alt-text dataset for automatic image captioning. In: Proceedings of the 56th annual meeting of the association for computational linguistics, vol 1. Long Papers, pp 2556–2565
44.
go back to reference Yao T, Pan Y, Li Y, Mei T (2017) Incorporating copying mechanism in image captioning for learning novel objects. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 5263–5271 Yao T, Pan Y, Li Y, Mei T (2017) Incorporating copying mechanism in image captioning for learning novel objects. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 5263–5271
45.
go back to reference Chiranjib S (2019) DeepSeq: learning browsing log data based personalized security vulnerabilities and counter intelligent measures. J Ambient Intell Human Comput 10(9):3573–3602CrossRef Chiranjib S (2019) DeepSeq: learning browsing log data based personalized security vulnerabilities and counter intelligent measures. J Ambient Intell Human Comput 10(9):3573–3602CrossRef
46.
go back to reference Zhang L, Sung F, Liu F, Xiang T, Gong S, Yang Y, Hospedales TM (2017) Actor-critic sequence training for image captioning. arXiv preprint arXiv:1706.09601 Zhang L, Sung F, Liu F, Xiang T, Gong S, Yang Y, Hospedales TM (2017) Actor-critic sequence training for image captioning. arXiv preprint arXiv:​1706.​09601
47.
go back to reference Sur C (2019) UCRLF: unified constrained reinforcement learning framework for phase-aware architectures for autonomous vehicle signaling and trajectory optimization. Evolut Intell 12:1–24CrossRef Sur C (2019) UCRLF: unified constrained reinforcement learning framework for phase-aware architectures for autonomous vehicle signaling and trajectory optimization. Evolut Intell 12:1–24CrossRef
48.
go back to reference Fu K, Jin J, Cui R, Sha F, Zhang C (2017) Aligning where to see and what to tell: image captioning with region-based attention and scene-specific contexts. IEEE Trans Pattern Anal Mach Intell 39(12):2321–2334CrossRef Fu K, Jin J, Cui R, Sha F, Zhang C (2017) Aligning where to see and what to tell: image captioning with region-based attention and scene-specific contexts. IEEE Trans Pattern Anal Mach Intell 39(12):2321–2334CrossRef
49.
go back to reference Ren Z, Wang X, Zhang N, Lv X, Li LJ (2017) Deep reinforcement learning-based image captioning with embedding reward. arXiv preprint arXiv:1704.03899 Ren Z, Wang X, Zhang N, Lv X, Li LJ (2017) Deep reinforcement learning-based image captioning with embedding reward. arXiv preprint arXiv:​1704.​03899
50.
go back to reference Liu S, Zhu Z, Ye N, Guadarrama S, Murphy K (2017) Improved image captioning via policy gradient optimization of spider. In: Proceedings on IEEE international conference on computer vision, vol 3, p 3 Liu S, Zhu Z, Ye N, Guadarrama S, Murphy K (2017) Improved image captioning via policy gradient optimization of spider. In: Proceedings on IEEE international conference on computer vision, vol 3, p 3
51.
go back to reference Cohn-Gordon R, Goodman N, Potts C (2018) Pragmatically informative image captioning with character-level reference. arXiv preprint arXiv:1804.05417 Cohn-Gordon R, Goodman N, Potts C (2018) Pragmatically informative image captioning with character-level reference. arXiv preprint arXiv:​1804.​05417
52.
go back to reference Liu C, Mao J, Sha F, Yuille AL (2017) Attention correctness in neural image captioning. In: AAAI, pp 4176–4182 Liu C, Mao J, Sha F, Yuille AL (2017) Attention correctness in neural image captioning. In: AAAI, pp 4176–4182
53.
go back to reference Lu J, Xiong C, Parikh D, Socher R (2017) Knowing when to look: adaptive attention via a visual sentinel for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), vol 6, p 2 Lu J, Xiong C, Parikh D, Socher R (2017) Knowing when to look: adaptive attention via a visual sentinel for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), vol 6, p 2
54.
go back to reference Vinyals O, Toshev A, Bengio S, Erhan D (2017) Show and tell: lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Trans Pattern Anal Mach Intell 39(4):652–663CrossRef Vinyals O, Toshev A, Bengio S, Erhan D (2017) Show and tell: lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Trans Pattern Anal Mach Intell 39(4):652–663CrossRef
55.
go back to reference Zhang M, Yang Y, Zhang H, Ji Y, Shen HT, Chua TS (2018) More is better: precise and detailed image captioning using online positive recall and missing concepts mining. IEEE Trans Image Process 28:32–44MathSciNetCrossRef Zhang M, Yang Y, Zhang H, Ji Y, Shen HT, Chua TS (2018) More is better: precise and detailed image captioning using online positive recall and missing concepts mining. IEEE Trans Image Process 28:32–44MathSciNetCrossRef
56.
go back to reference Park CC, Kim B, Kim G (2018) Towards personalized image captioning via multimodal memory networks. IEEE Trans Pattern Anal Mach Intell 41:999–1012CrossRef Park CC, Kim B, Kim G (2018) Towards personalized image captioning via multimodal memory networks. IEEE Trans Pattern Anal Mach Intell 41:999–1012CrossRef
57.
go back to reference Wu Q, Shen C, Wang P, Dick A, van den Hengel A (2017) Image captioning and visual question answering based on attributes and external knowledge. IEEE Trans Pattern Anal Mach Intell 40:1367–1381CrossRef Wu Q, Shen C, Wang P, Dick A, van den Hengel A (2017) Image captioning and visual question answering based on attributes and external knowledge. IEEE Trans Pattern Anal Mach Intell 40:1367–1381CrossRef
58.
go back to reference Gan C et al (2017) Stylenet: generating attractive visual captions with styles. In: CVPR Gan C et al (2017) Stylenet: generating attractive visual captions with styles. In: CVPR
59.
go back to reference Jin J et al (2015) Aligning where to see and what to tell: image caption with region-based attention and scene factorization. arXiv preprint arXiv:1506.06272 Jin J et al (2015) Aligning where to see and what to tell: image caption with region-based attention and scene factorization. arXiv preprint arXiv:​1506.​06272
60.
go back to reference Kiros R, Salakhutdinov R, Zemel RS (2014) Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539 Kiros R, Salakhutdinov R, Zemel RS (2014) Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:​1411.​2539
61.
go back to reference Pu Y et al (2016) Variational autoencoder for deep learning of images, labels and captions. In: Advances in neural information processing systems Pu Y et al (2016) Variational autoencoder for deep learning of images, labels and captions. In: Advances in neural information processing systems
62.
go back to reference Socher R et al (2014) Grounded compositional semantics for finding and describing images with sentences. Trans Assoc Comput Linguist 2:207–218CrossRef Socher R et al (2014) Grounded compositional semantics for finding and describing images with sentences. Trans Assoc Comput Linguist 2:207–218CrossRef
63.
go back to reference Sutskever I, Martens J, Hinton GE (2011) Generating text with recurrent neural networks. In: Proceedings of the 28th international conference on machine learning (ICML-11) Sutskever I, Martens J, Hinton GE (2011) Generating text with recurrent neural networks. In: Proceedings of the 28th international conference on machine learning (ICML-11)
64.
go back to reference Ilya S, Oriol V, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems Ilya S, Oriol V, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems
65.
go back to reference LTran D et al (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision LTran D et al (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision
66.
go back to reference Tran K et al (2016) Rich image captioning in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops Tran K et al (2016) Rich image captioning in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops
67.
go back to reference You Q et al (2016) Image captioning with semantic attention. In: Proceedings of the IEEE conference on computer vision and pattern recognition You Q et al (2016) Image captioning with semantic attention. In: Proceedings of the IEEE conference on computer vision and pattern recognition
68.
go back to reference Girshick R et al (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition Girshick R et al (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition
69.
go back to reference Jia X et al (2015) Guiding the long-short term memory model for image caption generation. In: Proceedings of the IEEE international conference on computer vision Jia X et al (2015) Guiding the long-short term memory model for image caption generation. In: Proceedings of the IEEE international conference on computer vision
70.
go back to reference Kulkarni G et al (2013) Babytalk: understanding and generating simple image descriptions. IEEE Trans Pattern Anal Mach Intell 35(12):2891–2903CrossRef Kulkarni G et al (2013) Babytalk: understanding and generating simple image descriptions. IEEE Trans Pattern Anal Mach Intell 35(12):2891–2903CrossRef
71.
go back to reference Kuznetsova P et al (2014) TREETALK: composition and compression of trees for image descriptions. TACL 2(10):351–362CrossRef Kuznetsova P et al (2014) TREETALK: composition and compression of trees for image descriptions. TACL 2(10):351–362CrossRef
72.
go back to reference Mao J et al (2015) Learning like a child: fast novel visual concept learning from sentence descriptions of images. In: Proceedings of the IEEE international conference on computer vision Mao J et al (2015) Learning like a child: fast novel visual concept learning from sentence descriptions of images. In: Proceedings of the IEEE international conference on computer vision
73.
go back to reference Mathews A, Xie L, He X (2016) SentiCap: generating image descriptions with sentiments. In: AAAI Mathews A, Xie L, He X (2016) SentiCap: generating image descriptions with sentiments. In: AAAI
74.
go back to reference Yang Y et al (2011) Corpus-guided sentence generation of natural images. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics Yang Y et al (2011) Corpus-guided sentence generation of natural images. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics
75.
go back to reference Sur C (2020) SACT: self-aware multi-space feature composition transformer for multinomial attention for video captioning. arXiv preprint arXiv:2006.14262 Sur C (2020) SACT: self-aware multi-space feature composition transformer for multinomial attention for video captioning. arXiv preprint arXiv:​2006.​14262
76.
go back to reference Sur C (2020) Self-segregating and coordinated-segregating transformer for focused deep multi-modular network for visual question answering. arXiv preprint arXiv:2006.14264 Sur C (2020) Self-segregating and coordinated-segregating transformer for focused deep multi-modular network for visual question answering. arXiv preprint arXiv:​2006.​14264
77.
go back to reference Sur C (2020) ReLGAN: generalization of consistency for GAN with disjoint constraints and relative learning of generative processes for multiple transformation learning. arXiv preprint arXiv:2006.07809 Sur C (2020) ReLGAN: generalization of consistency for GAN with disjoint constraints and relative learning of generative processes for multiple transformation learning. arXiv preprint arXiv:​2006.​07809
79.
go back to reference Sur C (2020) Gaussian Smoothen Semantic Features (GSSF)—exploring the linguistic aspects of visual captioning in Indian Languages (Bengali) using MSCOCO framework. arXiv preprint arXiv:2002.06701 Sur C (2020) Gaussian Smoothen Semantic Features (GSSF)—exploring the linguistic aspects of visual captioning in Indian Languages (Bengali) using MSCOCO framework. arXiv preprint arXiv:​2002.​06701
80.
go back to reference Sur C (2020) MRRC: multiple role representation crossover interpretation for image captioning with R-CNN feature distribution composition (FDC). arXiv preprint arXiv:2002.06436 Sur C (2020) MRRC: multiple role representation crossover interpretation for image captioning with R-CNN feature distribution composition (FDC). arXiv preprint arXiv:​2002.​06436
81.
go back to reference Sur C (2019) CRUR: coupled-recurrent unit for unification, conceptualization and context capture for language representation—a generalization of bi directional LSTM. arXiv preprint arXiv:1911.10132 Sur C (2019) CRUR: coupled-recurrent unit for unification, conceptualization and context capture for language representation—a generalization of bi directional LSTM. arXiv preprint arXiv:​1911.​10132
82.
go back to reference Chiranjib S (2020) RBN: enhancement in language attribute prediction using global representation of natural language transfer learning technology like Google BERT. SN Appl Sci 2(1):22 Chiranjib S (2020) RBN: enhancement in language attribute prediction using global representation of natural language transfer learning technology like Google BERT. SN Appl Sci 2(1):22
83.
go back to reference Sur C (2019) Tpsgtr: neural-symbolic tensor product scene-graph-triplet representation for image captioning. arXiv preprint arXiv:1911.10115 Sur C (2019) Tpsgtr: neural-symbolic tensor product scene-graph-triplet representation for image captioning. arXiv preprint arXiv:​1911.​10115
84.
go back to reference Sur C (2018) Feature fusion effects of tensor product representation on (de) compositional network for caption generation for images. arXiv preprint arXiv:1812.06624 Sur C (2018) Feature fusion effects of tensor product representation on (de) compositional network for caption generation for images. arXiv preprint arXiv:​1812.​06624
85.
go back to reference Donahue J et al. (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition Donahue J et al. (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition
86.
go back to reference Fang H et al (2015) From captions to visual concepts and back. In: Proceedings of the IEEE conference on computer vision and pattern recognition Fang H et al (2015) From captions to visual concepts and back. In: Proceedings of the IEEE conference on computer vision and pattern recognition
87.
go back to reference Wang C, Yang H, Bartz C, Meinel C (2018) Image captioning with deep bidirectional LSTMs and multi-task learning. ACM Trans Multimedia Comput Commun Appl (TOMM) 14(2s):40 Wang C, Yang H, Bartz C, Meinel C (2018) Image captioning with deep bidirectional LSTMs and multi-task learning. ACM Trans Multimedia Comput Commun Appl (TOMM) 14(2s):40
88.
go back to reference Anderson P, He X, Buehler C, Teney D, Johnson M, Gould S, Zhang L (2018) Bottom-up and top-down attention for image captioning and visual question answering. In: CVPR, vol 3, no 5, p 6 Anderson P, He X, Buehler C, Teney D, Johnson M, Gould S, Zhang L (2018) Bottom-up and top-down attention for image captioning and visual question answering. In: CVPR, vol 3, no 5, p 6
89.
go back to reference Sur C (2020) MRECN: mixed representation enhanced (de) compositional network for caption generation from visual features, modeling as pseudo tensor product representation. Int J Multimedia Inf Retr 9:1–26CrossRef Sur C (2020) MRECN: mixed representation enhanced (de) compositional network for caption generation from visual features, modeling as pseudo tensor product representation. Int J Multimedia Inf Retr 9:1–26CrossRef
Metadata
Title
aiTPR: Attribute Interaction-Tensor Product Representation for Image Caption
Author
Chiranjib Sur
Publication date
17-02-2021
Publisher
Springer US
Published in
Neural Processing Letters / Issue 2/2021
Print ISSN: 1370-4621
Electronic ISSN: 1573-773X
DOI
https://doi.org/10.1007/s11063-021-10438-5

Other articles of this Issue 2/2021

Neural Processing Letters 2/2021 Go to the issue