Skip to main content
Top
Published in: International Journal of Machine Learning and Cybernetics 9/2022

12-04-2022 | Original Article

Exploring correlation of relationship reasoning for scene graph generation

Authors: Peng Tian, Hongwei Mo, Laihao Jiang

Published in: International Journal of Machine Learning and Cybernetics | Issue 9/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Accurately reasoning about the relationship between objects play a central role in scene understanding. Due to the complexity of modeling visual relationships and the unbalanced distribution of relationship types, the results obtained by the existing methods are far from satisfying. In this work, we find that the interplay between contextual information of object pairs and their relationships can effectively regularize the space of visual relationship types to improve the accuracy of relationship reasoning. To this end, we incorporate the interplay into deep neural networks to facilitate scene graph generation by developing a Relationship Reasoning Network (ReRN). Specifically, the model uses a feature updating structure to mutual connection and iterative update the semantic features of objects and relationships to explore contextual information between objects. Then a graph attention mechanism is used to obtain the correlation information between object pairs and their relationships. Finally, our model adopts the correlation information to facilitate interactions recognition between objects while leveraging the mutual connections and joint refines of different semantic features to improve the accuracy of scene graph generation. Extensive experiments on the Visual Genome dataset demonstrate that our method outperforms the other state-of-the-art methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Show more products
Literature
1.
go back to reference Johnson J, Krishna R, Stark M et al (2015) Image retrieval using scene graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3668–3678 Johnson J, Krishna R, Stark M et al (2015) Image retrieval using scene graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3668–3678
2.
go back to reference Yatskar M, Zettlemoyer L, Farhadi A (2016) Situation recognition: visual semantic role labeling for image understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5534–5542 Yatskar M, Zettlemoyer L, Farhadi A (2016) Situation recognition: visual semantic role labeling for image understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5534–5542
3.
go back to reference Liu Y, Yu J, Han Y et al (2018) Understanding the effective receptive field in semantic image segmentation. Multimed Tools Appl 77(17):22159–22171CrossRef Liu Y, Yu J, Han Y et al (2018) Understanding the effective receptive field in semantic image segmentation. Multimed Tools Appl 77(17):22159–22171CrossRef
4.
go back to reference Yan M, Guo Y et al (2016) Deep learning for visual understanding: a review. Neurocomputing 187(Apr. 26):27–48 Yan M, Guo Y et al (2016) Deep learning for visual understanding: a review. Neurocomputing 187(Apr. 26):27–48
5.
go back to reference Sun J, Li Y, Lu H et al (2020) Deep learning for visual segmentation: a review. In: Proceedings of 44th IEEE annual computers, software, and applications conference, pp 1256–1260 Sun J, Li Y, Lu H et al (2020) Deep learning for visual segmentation: a review. In: Proceedings of 44th IEEE annual computers, software, and applications conference, pp 1256–1260
7.
go back to reference Vijay BA et al (2017) SegNet: a deep convolutional encoder-decoder architecture for scene segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495CrossRef Vijay BA et al (2017) SegNet: a deep convolutional encoder-decoder architecture for scene segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495CrossRef
9.
go back to reference Xu DF, Zhu YK et al (2017) Scene graph generation by iterative message passing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5410–5419 Xu DF, Zhu YK et al (2017) Scene graph generation by iterative message passing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5410–5419
10.
go back to reference Yang J, Lu J, Lee S et al (2018) Graph rcnn for scene graph generation. In: European conference on computer vision, pp 670–685 Yang J, Lu J, Lee S et al (2018) Graph rcnn for scene graph generation. In: European conference on computer vision, pp 670–685
11.
go back to reference Li Y, Ouyang W, Zhou B et al (2018) Factorizable net: an efficient sub graph-based framework for scene graph generation. In: European conference on computer vision, pp 335–351 Li Y, Ouyang W, Zhou B et al (2018) Factorizable net: an efficient sub graph-based framework for scene graph generation. In: European conference on computer vision, pp 335–351
13.
go back to reference Mohamed KB (2020) After all, only the last neuron matters: comparing multi-modal fusion functions for scene graph generation. arXiv:2011.04779 (arXiv preprint) Mohamed KB (2020) After all, only the last neuron matters: comparing multi-modal fusion functions for scene graph generation. arXiv:​2011.​04779 (arXiv preprint)
14.
go back to reference Li Y, Ouyang W, Zhou B et al (2017) Scene graph generation from objects, phrases and region captions. In: Proceedings of the IEEE international conference on computer vision, pp 1261–1270 Li Y, Ouyang W, Zhou B et al (2017) Scene graph generation from objects, phrases and region captions. In: Proceedings of the IEEE international conference on computer vision, pp 1261–1270
15.
go back to reference Krishna R, Zhu Y, Groth O et al (2017) Visual genome: connecting language and vision using crowdsourced dense image annotations. Int J Comput Vis 123(1):32–73MathSciNetCrossRef Krishna R, Zhu Y, Groth O et al (2017) Visual genome: connecting language and vision using crowdsourced dense image annotations. Int J Comput Vis 123(1):32–73MathSciNetCrossRef
16.
go back to reference Zellers R, Yatskar M, Thomson S et al (2018) Neural motifs: scene graph parsing with global context. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5831–5840 Zellers R, Yatskar M, Thomson S et al (2018) Neural motifs: scene graph parsing with global context. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5831–5840
17.
go back to reference Zitnick C, Parikh, Vanderwende L (2013) Learning the visual interpretation of sentences. In: Proceedings of the IEEE international conference on computer vision, pp 1681–1688 Zitnick C, Parikh, Vanderwende L (2013) Learning the visual interpretation of sentences. In: Proceedings of the IEEE international conference on computer vision, pp 1681–1688
19.
go back to reference Wang W, Wang R, Shan S et al (2019) Exploring context and visual pattern of relationship for scene graph generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8188–8197 Wang W, Wang R, Shan S et al (2019) Exploring context and visual pattern of relationship for scene graph generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8188–8197
21.
go back to reference Li S, Tang M, Zhang J et al (2020) Attentive gated graph neural network for image scene graph generation. Symmetry 12(4):511CrossRef Li S, Tang M, Zhang J et al (2020) Attentive gated graph neural network for image scene graph generation. Symmetry 12(4):511CrossRef
22.
go back to reference Zhang H, Kyaw Z, Chang S-F et al (2017) Visual translation embedding network for visual relation detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5532–5540 Zhang H, Kyaw Z, Chang S-F et al (2017) Visual translation embedding network for visual relation detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5532–5540
23.
go back to reference Lu C, Krishna R, Bernstein M, Li F-F (2016) Visual relationship detection with language priors. In: European conference on computer vision, pp 852–869 Lu C, Krishna R, Bernstein M, Li F-F (2016) Visual relationship detection with language priors. In: European conference on computer vision, pp 852–869
24.
go back to reference Dai B, Zhang Y, Lin D (2017) Detecting visual relationships with deep relational networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3076–3086 Dai B, Zhang Y, Lin D (2017) Detecting visual relationships with deep relational networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3076–3086
25.
go back to reference Tao H, Gao L, Song JK et al (2020) Learning from the scene and borrowing from the rich: tackling the long tail in scene graph generation. arXiv:2006.07585 (arXiv preprint) Tao H, Gao L, Song JK et al (2020) Learning from the scene and borrowing from the rich: tackling the long tail in scene graph generation. arXiv:​2006.​07585 (arXiv preprint)
26.
go back to reference Wan H, Luo YH et al (2018) Representation learning for scene graph completion via jointly structural and visual embedding. In: Proceedings of the twenty-seventh international joint conference on artificial intelligence, pp 949–956 Wan H, Luo YH et al (2018) Representation learning for scene graph completion via jointly structural and visual embedding. In: Proceedings of the twenty-seventh international joint conference on artificial intelligence, pp 949–956
27.
go back to reference Hung ZS, Mallya A, Lazebnik S (2020) Contextual translation embedding for visual relationship detection and scene graph generation. IEEE Trans Pattern Anal Mach Intell 99:1–1 Hung ZS, Mallya A, Lazebnik S (2020) Contextual translation embedding for visual relationship detection and scene graph generation. IEEE Trans Pattern Anal Mach Intell 99:1–1
28.
go back to reference Wei M, Yuan C, Yue X et al (2020) HOSE-Net: higher order structure embedded network for scene graph generation. arXiv:2008.05156 (arXiv preprint) Wei M, Yuan C, Yue X et al (2020) HOSE-Net: higher order structure embedded network for scene graph generation. arXiv:​2008.​05156 (arXiv preprint)
29.
go back to reference Zhu Y, Jiang S (2018) Deep structured learning for visual relationship detection. In: AAAI conference on artificial intelligence, pp 7623–7630 Zhu Y, Jiang S (2018) Deep structured learning for visual relationship detection. In: AAAI conference on artificial intelligence, pp 7623–7630
31.
go back to reference Lin X, Ding C, Zeng J et al (2020) GPS-Net: graph property sensing network for scene graph generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3746–3753 Lin X, Ding C, Zeng J et al (2020) GPS-Net: graph property sensing network for scene graph generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3746–3753
32.
go back to reference Chen T, Yu W, Chen R et al (2019) Knowledge-embedded routing network for scene graph generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6163–6171 Chen T, Yu W, Chen R et al (2019) Knowledge-embedded routing network for scene graph generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6163–6171
33.
go back to reference Zhou Y, Sun S, Zhang C et al (2020) Exploring the hierarchy in relation labels for scene graph generation. arXiv:2009.05834 (arXiv preprint) Zhou Y, Sun S, Zhang C et al (2020) Exploring the hierarchy in relation labels for scene graph generation. arXiv:​2009.​05834 (arXiv preprint)
34.
go back to reference Sharifzadeh S, Baharlou S M, Tresp V (2020) Classification by attention: scene graph classification with prior knowledge. arXiv:2011.1008 (arXiv preprint) Sharifzadeh S, Baharlou S M, Tresp V (2020) Classification by attention: scene graph classification with prior knowledge. arXiv:​2011.​1008 (arXiv preprint)
35.
go back to reference Li Y, Tarlow D, Brockschmidt M et al (2016) Gated graph sequence neural networks. In: Proceedings of the IEEE international conference on learning representations, pp 1–20 Li Y, Tarlow D, Brockschmidt M et al (2016) Gated graph sequence neural networks. In: Proceedings of the IEEE international conference on learning representations, pp 1–20
36.
go back to reference Girshick R, Donahue J, Darrell T et al (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587 Girshick R, Donahue J, Darrell T et al (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
37.
go back to reference Ren S, He K, Girshick R et al (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149CrossRef Ren S, He K, Girshick R et al (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149CrossRef
38.
39.
go back to reference Liu W, Anguelov D, Erhan D et al (2016) SSD: single shot multibox detector. In: European conference on computer vision, pp 21–37 Liu W, Anguelov D, Erhan D et al (2016) SSD: single shot multibox detector. In: European conference on computer vision, pp 21–37
40.
go back to reference Desai C, Ramanan D (2012) Detecting actions, poses, and objects with relational phraselets. In: European conference on computer vision, pp 158–172 Desai C, Ramanan D (2012) Detecting actions, poses, and objects with relational phraselets. In: European conference on computer vision, pp 158–172
41.
go back to reference Choi W, Chao YW, Pantofaru C et al (2015) Indoor scene understanding with geometric and semantic contexts. Int J Comput Vis 112:204–220MathSciNetCrossRef Choi W, Chao YW, Pantofaru C et al (2015) Indoor scene understanding with geometric and semantic contexts. Int J Comput Vis 112:204–220MathSciNetCrossRef
42.
go back to reference Li Y, Ouyang W, Wang X et al (2017) ViP-CNN: visual phrase guided convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1347–1356 Li Y, Ouyang W, Wang X et al (2017) ViP-CNN: visual phrase guided convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1347–1356
43.
go back to reference Che W, Fan X, Xiong R et al (2018) Paragraph generation network with visual relationship detection. In: Proceedings of the 26th IEEE ACM international conference, pp 1435–1443 Che W, Fan X, Xiong R et al (2018) Paragraph generation network with visual relationship detection. In: Proceedings of the 26th IEEE ACM international conference, pp 1435–1443
44.
go back to reference Wang L, Lin P, Cheng J et al (2021) Visual relationship detection with recurrent attention and negative sampling. Neurocomputing 434(28):55–66 Wang L, Lin P, Cheng J et al (2021) Visual relationship detection with recurrent attention and negative sampling. Neurocomputing 434(28):55–66
45.
go back to reference Mou L, Hua Y, Zhu XX (2019) Spatial relational reasoning in networks for improving semantic segmentation of aerial images. In: Proceedings of the IEEE international conference on geoscience and remote sensing symposium, pp 5232–5235 Mou L, Hua Y, Zhu XX (2019) Spatial relational reasoning in networks for improving semantic segmentation of aerial images. In: Proceedings of the IEEE international conference on geoscience and remote sensing symposium, pp 5232–5235
46.
go back to reference Hofmarcher M, Unterthiner T, Arjona-Medina J et al (2019) Visual scene understanding for autonomous driving using semantic segmentation. Explainable AI: interpreting, explaining and visualizing deep learning. Lecture notes in computer science, vol 11700. Springer, Cham, pp 285–296CrossRef Hofmarcher M, Unterthiner T, Arjona-Medina J et al (2019) Visual scene understanding for autonomous driving using semantic segmentation. Explainable AI: interpreting, explaining and visualizing deep learning. Lecture notes in computer science, vol 11700. Springer, Cham, pp 285–296CrossRef
47.
go back to reference Vincent SC, Paroma V, Ranjay K et al (2019) Scene graph prediction with limited labels. In: Proceedings of the IEEE international conference on computer vision, pp 2580–2590 Vincent SC, Paroma V, Ranjay K et al (2019) Scene graph prediction with limited labels. In: Proceedings of the IEEE international conference on computer vision, pp 2580–2590
48.
go back to reference Zareian A, Karaman S, Chang SF (2020) Bridging knowledge graphs to generate scene graphs. In: European conference on computer vision, pp 606–623 Zareian A, Karaman S, Chang SF (2020) Bridging knowledge graphs to generate scene graphs. In: European conference on computer vision, pp 606–623
49.
go back to reference Dornadula A, Narcomey A, Krishna R et al (2019) Learning predicates as functions to enable few-shot scene graph prediction. arXiv:1906.04876 (arXiv preprint) Dornadula A, Narcomey A, Krishna R et al (2019) Learning predicates as functions to enable few-shot scene graph prediction. arXiv:​1906.​04876 (arXiv preprint)
50.
go back to reference Qi X, Liao R, Jia J et al (2017) 3D graph neural networks for RGBD semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5199–5208 Qi X, Liao R, Jia J et al (2017) 3D graph neural networks for RGBD semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5199–5208
51.
go back to reference Kenneth M, Ruslan S, Abhinav G (2017) The more you know: using knowledge graphs for image classification. arXiv:1612.04844 (arXiv preprint) Kenneth M, Ruslan S, Abhinav G (2017) The more you know: using knowledge graphs for image classification. arXiv:​1612.​04844 (arXiv preprint)
52.
go back to reference Li R, Tapaswi M, Liao R et al (2017) Situation recognition with graph neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 4173–4182 Li R, Tapaswi M, Liao R et al (2017) Situation recognition with graph neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 4173–4182
54.
go back to reference Zhou J, Cui G, Zhang Z et al (2020) Graph neural networks: a review of methods and applications. AI Open, pp 57–81 Zhou J, Cui G, Zhang Z et al (2020) Graph neural networks: a review of methods and applications. AI Open, pp 57–81
55.
go back to reference Chen ZM, Wei XS, Wang P et al (2019) Multi-label image recognition with graph convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5172–5181 Chen ZM, Wei XS, Wang P et al (2019) Multi-label image recognition with graph convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5172–5181
56.
57.
Metadata
Title
Exploring correlation of relationship reasoning for scene graph generation
Authors
Peng Tian
Hongwei Mo
Laihao Jiang
Publication date
12-04-2022
Publisher
Springer Berlin Heidelberg
Published in
International Journal of Machine Learning and Cybernetics / Issue 9/2022
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-022-01538-2

Other articles of this Issue 9/2022

International Journal of Machine Learning and Cybernetics 9/2022 Go to the issue