nach oben

Pattern Analysis and Applications

Erschienen in:

10.10.2023 | Short Paper

Bel: Batch Equalization Loss for scene graph generation

verfasst von: Huihui Li, Baorong Liu, Dongqing Wu, Hang Liu, Lei Guo

Erschienen in: Pattern Analysis and Applications | Ausgabe 4/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Since scene graph can be used as the basis of many high-level vision semantic tasks, scene graph generation has attracted more and more attention of researchers, but most works are limited by the long-tail distribution of dataset, tend to predict those frequent but uninformative predicates such as “on” and “of.” From a novel perspective, we found that in the training process, the model would promote the categories included in the batch, while suppressing the categories not in the batch. The long-tailed distribution of the data leads to the continuous suppression of tail categories, thus results in the model bias. In order to solve the problem above, we propose a novel simple and effective method named Batch Equalization Loss, which can be applied to most of the existing models and can bring effective improvement with only a few changes. It is worth noting that our method can achieve a more significant improvement on small batches than on big batches. Extensive experiments on the VG150 dataset show that our work can bring significant improvement on the basis of existing works. Code will be available at GitHub in the near future.

Vorheriger Artikel ViT-PGC: vision transformer for pedestrian gender classification on small-size dataset

Nächster Artikel EMTNet: efficient mobile transformer network for real-time monocular depth estimation

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Xu D, Zhu Y, Choy CB, Fei-Fei L (2017) Scene graph generation by iterative message passing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5410–5419

Norcliffe-Brown W, Vafeias S, Parisot S (2018) Learning conditioned graph structures forinterpretable visual question answering. In: Advances in neural information processing systems, pp 8344–8353

Zhu Z, Yu J, Wang Y, Sun Y, Hu Y, Wu Q (2020) Mucko: multi-layer cross-modal knowledge reasoning for fact-based visual question answering. arXiv:2006.09073

Yao T, Pan Y, Li Y, Mei T (2018) Exploring visual relationship for image captioning. In: Proceedings of the European conference on computer vision (ECCV), pp 684–699

Johnson J, Krishna R, Stark M, Li L-J, Shamma D, Bernstein M, Fei-Fei L (2015) Image retrieval using scene graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3668–3678

Ramnath S, Saha A, Chakrabarti S, Khapra MM (2019) Scene graph based image retrieval–a case study on the clevr dataset. arXiv:1911.00850

Schroeder B, Tripathi S (2020) Structured query-based image retrieval using scene graphs. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 178–179

Tang K, Zhang H, Wu B, Luo W, Liu W (2019) Learning to compose dynamic tree structures for visual contexts. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6619–6628

Yang J, Lu J, Lee S, Batra D, Parikh D (2018) Graph r-cnn for scene graph generation. In: Proceedings of the European conference on computer vision (ECCV), pp 670–685

10.

Zellers R, Yatskar M, Thomson S, Choi Y (2018) Neural motifs: scene graph parsing with global context. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5831–5840

11.

Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr PH (2015) Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 1529–1537

12.

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef

13.

Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv:1609.02907

14.

Zheng S, Chen S, Jin Q (2019) Visual relation detection with multi-level attention. In: Proceedings of the 27th ACM international conference on multimedia, pp 121–129

15.

Gkanatsios N, Pitsikalis V, Koutras P, Maragos P (2019) Attention-translation-relation network for scalable scene graph generation. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp 0–0

16.

Han C, Shen F, Liu L, Yang Y, Shen HT (2018) Visual spatial attention network for relationship detection. In: Proceedings of the 26th ACM international conference on multimedia, pp 510–518

17.

Chen T, Yu W, Chen R, Lin L (2019) Knowledge-embedded routing network for scene graph generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6163–6171

18.

Tang K, Niu Y, Huang J, Shi J, Zhang H (2020) Unbiased scene graph generation from biased training. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3716–3725

19.

Yu J, Chai Y, Wang Y, Hu Y, Wu Q (2020) Cogtree: cognition tree loss for unbiased scene graph generation. arXiv:2009.07526

20.

Zhang A, Yao Y, Chen Q, Ji W, Liu Z, Sun M, Chua T-S (2022) Fine-grained scene graph generation with data transfer. arXiv:2203.11654

21.

Tan J, Wang C, Li B, Li Q, Ouyang W, Yin C, Yan J (2020) Equalization loss for long-tailed object recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11662–11671

22.

Woo S, Kim D, Cho D, Kweon IS (2018) LinkNet: relational embedding for scene graph. Proc 32nd Int Conf Neural Inf Process Syst 558–568

23.

Li H, Lv J, Xiao Q, Liu Y, Wu S, Huang X (2022) Avr: attention based salient visual relationship detection. In: Fourteenth international conference on digital image processing (ICDIP 2022), vol 12342, pp 697–708. SPIE

24.

Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357CrossRefMATH

25.

Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 764–773

26.

Mahajan D, Girshick R, Ramanathan V, He K, Paluri M, Li Y, Bharambe A, Van Der Maaten L (2018) Exploring the limits of weakly supervised pretraining. In: Proceedings of the European conference on computer vision (ECCV), pp 181–196

27.

Drummond C, Holte RC, et al. (2003) C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on learning from imbalanced datasets II, vol 11, pp 1–8

28.

Wang Y-X, Ramanan D, Hebert M (2017) Learning to model the tail. Proc Adv Neural Inf Process Syst 7029–7039

29.

Huang C, Li Y, Loy CC, Tang X (2016) Learning deep representation for imbalanced classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5375–5384

30.

Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988

31.

Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li L-J, Shamma DA et al (2017) Visual genome: connecting language and vision using crowdsourced dense image annotations. Int J Comput Vis 123(1):32–73MathSciNetCrossRef

32.

Lu C, Krishna R, Bernstein M, Fei-Fei L (2016) Visual relationship detection with language priors. In: European conference on computer vision. Springer, pp 852–869

33.

Tang K, Huang J, Zhang H (2020) Long-tailed classification by keeping the good and removing the bad momentum causal effect. Adv Neural Inf Process Syst 33:1513–1524

34.

Yan S, Shen C, Jin Z, Huang J, Jiang R, Chen Y, Hua X-S (2020) Pcpl: predicate-correlation perception learning for unbiased scene graph generation. In: Proceedings of the 28th ACM international conference on multimedia, pp 265–273

35.

Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149CrossRef

36.

Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

37.

Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500

38.

Lin X, Ding C, Zeng J, Tao D (2020) Gps-net: graph property sensing network for scene graph generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3746–3753

39.

Zhang H, Kyaw Z, Chang S-F, Chua T-S (2017) Visual translation embedding network for visual relation detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5532–5540

40.

Li R, Zhang S, Wan B, He X (2021) Bipartite graph network with adaptive message passing for unbiased scene graph generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11109–11119

41.

Suhail M, Mittal A, Siddiquie B, Broaddus C, Eledath J, Medioni G, Sigal L (2021) Energy-based learning for scene graph generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13936–13945

42.

Chiou M-J, Ding H, Yan H, Wang C, Zimmermann R, Feng J (2021) Recovering the unbiased scene graphs from the biased ones. In: Proceedings of the 29th ACM international conference on multimedia, pp 1581–1590

Titel: Bel: Batch Equalization Loss for scene graph generation
verfasst von: Huihui Li
Baorong Liu
Dongqing Wu
Hang Liu
Lei Guo
Publikationsdatum: 10.10.2023
Verlag: Springer London
Erschienen in: Pattern Analysis and Applications / Ausgabe 4/2023
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI: https://doi.org/10.1007/s10044-023-01199-z

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 4/2023

AdaHOSVD: an adaptive higher-order singular value decomposition method for point cloud denoising

Expanded relative density peak clustering for image segmentation

Deep spatial and tonal data optimisation for homogeneous diffusion inpainting

SAFPN: a full semantic feature pyramid network for object detection

EMTNet: efficient mobile transformer network for real-time monocular depth estimation

CInf-FS: an efficient infinite feature selection method using K-means clustering to partition large feature spaces

Premium Partner