Skip to main content
Erschienen in: Pattern Analysis and Applications 4/2023

10.10.2023 | Short Paper

Bel: Batch Equalization Loss for scene graph generation

verfasst von: Huihui Li, Baorong Liu, Dongqing Wu, Hang Liu, Lei Guo

Erschienen in: Pattern Analysis and Applications | Ausgabe 4/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Since scene graph can be used as the basis of many high-level vision semantic tasks, scene graph generation has attracted more and more attention of researchers, but most works are limited by the long-tail distribution of dataset, tend to predict those frequent but uninformative predicates such as “on” and “of.” From a novel perspective, we found that in the training process, the model would promote the categories included in the batch, while suppressing the categories not in the batch. The long-tailed distribution of the data leads to the continuous suppression of tail categories, thus results in the model bias. In order to solve the problem above, we propose a novel simple and effective method named Batch Equalization Loss, which can be applied to most of the existing models and can bring effective improvement with only a few changes. It is worth noting that our method can achieve a more significant improvement on small batches than on big batches. Extensive experiments on the VG150 dataset show that our work can bring significant improvement on the basis of existing works. Code will be available at GitHub in the near future.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
 
Literatur
1.
Zurück zum Zitat Xu D, Zhu Y, Choy CB, Fei-Fei L (2017) Scene graph generation by iterative message passing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5410–5419 Xu D, Zhu Y, Choy CB, Fei-Fei L (2017) Scene graph generation by iterative message passing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5410–5419
2.
Zurück zum Zitat Norcliffe-Brown W, Vafeias S, Parisot S (2018) Learning conditioned graph structures forinterpretable visual question answering. In: Advances in neural information processing systems, pp 8344–8353 Norcliffe-Brown W, Vafeias S, Parisot S (2018) Learning conditioned graph structures forinterpretable visual question answering. In: Advances in neural information processing systems, pp 8344–8353
3.
Zurück zum Zitat Zhu Z, Yu J, Wang Y, Sun Y, Hu Y, Wu Q (2020) Mucko: multi-layer cross-modal knowledge reasoning for fact-based visual question answering. arXiv:2006.09073 Zhu Z, Yu J, Wang Y, Sun Y, Hu Y, Wu Q (2020) Mucko: multi-layer cross-modal knowledge reasoning for fact-based visual question answering. arXiv:​2006.​09073
4.
Zurück zum Zitat Yao T, Pan Y, Li Y, Mei T (2018) Exploring visual relationship for image captioning. In: Proceedings of the European conference on computer vision (ECCV), pp 684–699 Yao T, Pan Y, Li Y, Mei T (2018) Exploring visual relationship for image captioning. In: Proceedings of the European conference on computer vision (ECCV), pp 684–699
5.
Zurück zum Zitat Johnson J, Krishna R, Stark M, Li L-J, Shamma D, Bernstein M, Fei-Fei L (2015) Image retrieval using scene graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3668–3678 Johnson J, Krishna R, Stark M, Li L-J, Shamma D, Bernstein M, Fei-Fei L (2015) Image retrieval using scene graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3668–3678
6.
Zurück zum Zitat Ramnath S, Saha A, Chakrabarti S, Khapra MM (2019) Scene graph based image retrieval–a case study on the clevr dataset. arXiv:1911.00850 Ramnath S, Saha A, Chakrabarti S, Khapra MM (2019) Scene graph based image retrieval–a case study on the clevr dataset. arXiv:​1911.​00850
7.
Zurück zum Zitat Schroeder B, Tripathi S (2020) Structured query-based image retrieval using scene graphs. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 178–179 Schroeder B, Tripathi S (2020) Structured query-based image retrieval using scene graphs. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 178–179
8.
Zurück zum Zitat Tang K, Zhang H, Wu B, Luo W, Liu W (2019) Learning to compose dynamic tree structures for visual contexts. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6619–6628 Tang K, Zhang H, Wu B, Luo W, Liu W (2019) Learning to compose dynamic tree structures for visual contexts. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6619–6628
9.
Zurück zum Zitat Yang J, Lu J, Lee S, Batra D, Parikh D (2018) Graph r-cnn for scene graph generation. In: Proceedings of the European conference on computer vision (ECCV), pp 670–685 Yang J, Lu J, Lee S, Batra D, Parikh D (2018) Graph r-cnn for scene graph generation. In: Proceedings of the European conference on computer vision (ECCV), pp 670–685
10.
Zurück zum Zitat Zellers R, Yatskar M, Thomson S, Choi Y (2018) Neural motifs: scene graph parsing with global context. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5831–5840 Zellers R, Yatskar M, Thomson S, Choi Y (2018) Neural motifs: scene graph parsing with global context. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5831–5840
11.
Zurück zum Zitat Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr PH (2015) Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 1529–1537 Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr PH (2015) Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 1529–1537
12.
Zurück zum Zitat Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef
14.
Zurück zum Zitat Zheng S, Chen S, Jin Q (2019) Visual relation detection with multi-level attention. In: Proceedings of the 27th ACM international conference on multimedia, pp 121–129 Zheng S, Chen S, Jin Q (2019) Visual relation detection with multi-level attention. In: Proceedings of the 27th ACM international conference on multimedia, pp 121–129
15.
Zurück zum Zitat Gkanatsios N, Pitsikalis V, Koutras P, Maragos P (2019) Attention-translation-relation network for scalable scene graph generation. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp 0–0 Gkanatsios N, Pitsikalis V, Koutras P, Maragos P (2019) Attention-translation-relation network for scalable scene graph generation. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp 0–0
16.
Zurück zum Zitat Han C, Shen F, Liu L, Yang Y, Shen HT (2018) Visual spatial attention network for relationship detection. In: Proceedings of the 26th ACM international conference on multimedia, pp 510–518 Han C, Shen F, Liu L, Yang Y, Shen HT (2018) Visual spatial attention network for relationship detection. In: Proceedings of the 26th ACM international conference on multimedia, pp 510–518
17.
Zurück zum Zitat Chen T, Yu W, Chen R, Lin L (2019) Knowledge-embedded routing network for scene graph generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6163–6171 Chen T, Yu W, Chen R, Lin L (2019) Knowledge-embedded routing network for scene graph generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6163–6171
18.
Zurück zum Zitat Tang K, Niu Y, Huang J, Shi J, Zhang H (2020) Unbiased scene graph generation from biased training. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3716–3725 Tang K, Niu Y, Huang J, Shi J, Zhang H (2020) Unbiased scene graph generation from biased training. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3716–3725
19.
20.
Zurück zum Zitat Zhang A, Yao Y, Chen Q, Ji W, Liu Z, Sun M, Chua T-S (2022) Fine-grained scene graph generation with data transfer. arXiv:2203.11654 Zhang A, Yao Y, Chen Q, Ji W, Liu Z, Sun M, Chua T-S (2022) Fine-grained scene graph generation with data transfer. arXiv:​2203.​11654
21.
Zurück zum Zitat Tan J, Wang C, Li B, Li Q, Ouyang W, Yin C, Yan J (2020) Equalization loss for long-tailed object recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11662–11671 Tan J, Wang C, Li B, Li Q, Ouyang W, Yin C, Yan J (2020) Equalization loss for long-tailed object recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11662–11671
22.
Zurück zum Zitat Woo S, Kim D, Cho D, Kweon IS (2018) LinkNet: relational embedding for scene graph. Proc 32nd Int Conf Neural Inf Process Syst 558–568 Woo S, Kim D, Cho D, Kweon IS (2018) LinkNet: relational embedding for scene graph. Proc 32nd Int Conf Neural Inf Process Syst 558–568
23.
Zurück zum Zitat Li H, Lv J, Xiao Q, Liu Y, Wu S, Huang X (2022) Avr: attention based salient visual relationship detection. In: Fourteenth international conference on digital image processing (ICDIP 2022), vol 12342, pp 697–708. SPIE Li H, Lv J, Xiao Q, Liu Y, Wu S, Huang X (2022) Avr: attention based salient visual relationship detection. In: Fourteenth international conference on digital image processing (ICDIP 2022), vol 12342, pp 697–708. SPIE
24.
Zurück zum Zitat Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357CrossRefMATH Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357CrossRefMATH
25.
Zurück zum Zitat Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 764–773 Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 764–773
26.
Zurück zum Zitat Mahajan D, Girshick R, Ramanathan V, He K, Paluri M, Li Y, Bharambe A, Van Der Maaten L (2018) Exploring the limits of weakly supervised pretraining. In: Proceedings of the European conference on computer vision (ECCV), pp 181–196 Mahajan D, Girshick R, Ramanathan V, He K, Paluri M, Li Y, Bharambe A, Van Der Maaten L (2018) Exploring the limits of weakly supervised pretraining. In: Proceedings of the European conference on computer vision (ECCV), pp 181–196
27.
Zurück zum Zitat Drummond C, Holte RC, et al. (2003) C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on learning from imbalanced datasets II, vol 11, pp 1–8 Drummond C, Holte RC, et al. (2003) C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on learning from imbalanced datasets II, vol 11, pp 1–8
28.
Zurück zum Zitat Wang Y-X, Ramanan D, Hebert M (2017) Learning to model the tail. Proc Adv Neural Inf Process Syst 7029–7039 Wang Y-X, Ramanan D, Hebert M (2017) Learning to model the tail. Proc Adv Neural Inf Process Syst 7029–7039
29.
Zurück zum Zitat Huang C, Li Y, Loy CC, Tang X (2016) Learning deep representation for imbalanced classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5375–5384 Huang C, Li Y, Loy CC, Tang X (2016) Learning deep representation for imbalanced classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5375–5384
30.
Zurück zum Zitat Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988 Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
31.
Zurück zum Zitat Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li L-J, Shamma DA et al (2017) Visual genome: connecting language and vision using crowdsourced dense image annotations. Int J Comput Vis 123(1):32–73MathSciNetCrossRef Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li L-J, Shamma DA et al (2017) Visual genome: connecting language and vision using crowdsourced dense image annotations. Int J Comput Vis 123(1):32–73MathSciNetCrossRef
32.
Zurück zum Zitat Lu C, Krishna R, Bernstein M, Fei-Fei L (2016) Visual relationship detection with language priors. In: European conference on computer vision. Springer, pp 852–869 Lu C, Krishna R, Bernstein M, Fei-Fei L (2016) Visual relationship detection with language priors. In: European conference on computer vision. Springer, pp 852–869
33.
Zurück zum Zitat Tang K, Huang J, Zhang H (2020) Long-tailed classification by keeping the good and removing the bad momentum causal effect. Adv Neural Inf Process Syst 33:1513–1524 Tang K, Huang J, Zhang H (2020) Long-tailed classification by keeping the good and removing the bad momentum causal effect. Adv Neural Inf Process Syst 33:1513–1524
34.
Zurück zum Zitat Yan S, Shen C, Jin Z, Huang J, Jiang R, Chen Y, Hua X-S (2020) Pcpl: predicate-correlation perception learning for unbiased scene graph generation. In: Proceedings of the 28th ACM international conference on multimedia, pp 265–273 Yan S, Shen C, Jin Z, Huang J, Jiang R, Chen Y, Hua X-S (2020) Pcpl: predicate-correlation perception learning for unbiased scene graph generation. In: Proceedings of the 28th ACM international conference on multimedia, pp 265–273
35.
Zurück zum Zitat Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149CrossRef Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149CrossRef
36.
Zurück zum Zitat Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125 Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
37.
Zurück zum Zitat Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500 Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500
38.
Zurück zum Zitat Lin X, Ding C, Zeng J, Tao D (2020) Gps-net: graph property sensing network for scene graph generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3746–3753 Lin X, Ding C, Zeng J, Tao D (2020) Gps-net: graph property sensing network for scene graph generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3746–3753
39.
Zurück zum Zitat Zhang H, Kyaw Z, Chang S-F, Chua T-S (2017) Visual translation embedding network for visual relation detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5532–5540 Zhang H, Kyaw Z, Chang S-F, Chua T-S (2017) Visual translation embedding network for visual relation detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5532–5540
40.
Zurück zum Zitat Li R, Zhang S, Wan B, He X (2021) Bipartite graph network with adaptive message passing for unbiased scene graph generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11109–11119 Li R, Zhang S, Wan B, He X (2021) Bipartite graph network with adaptive message passing for unbiased scene graph generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11109–11119
41.
Zurück zum Zitat Suhail M, Mittal A, Siddiquie B, Broaddus C, Eledath J, Medioni G, Sigal L (2021) Energy-based learning for scene graph generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13936–13945 Suhail M, Mittal A, Siddiquie B, Broaddus C, Eledath J, Medioni G, Sigal L (2021) Energy-based learning for scene graph generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13936–13945
42.
Zurück zum Zitat Chiou M-J, Ding H, Yan H, Wang C, Zimmermann R, Feng J (2021) Recovering the unbiased scene graphs from the biased ones. In: Proceedings of the 29th ACM international conference on multimedia, pp 1581–1590 Chiou M-J, Ding H, Yan H, Wang C, Zimmermann R, Feng J (2021) Recovering the unbiased scene graphs from the biased ones. In: Proceedings of the 29th ACM international conference on multimedia, pp 1581–1590
Metadaten
Titel
Bel: Batch Equalization Loss for scene graph generation
verfasst von
Huihui Li
Baorong Liu
Dongqing Wu
Hang Liu
Lei Guo
Publikationsdatum
10.10.2023
Verlag
Springer London
Erschienen in
Pattern Analysis and Applications / Ausgabe 4/2023
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-023-01199-z

Weitere Artikel der Ausgabe 4/2023

Pattern Analysis and Applications 4/2023 Zur Ausgabe

Premium Partner