Skip to main content
Top
Published in: Neural Processing Letters 8/2023

05-07-2023

CAM R-CNN: End-to-End Object Detection with Class Activation Maps

Authors: Shengchuan Zhang, Songlin Yu, Haixin Ding, Jie Hu, Liujuan Cao

Published in: Neural Processing Letters | Issue 8/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Class activation maps (CAMs) have been widely used on weakly-supervised object localization, which generate attention maps for specific categories in an image. Since CAMs can be obtained using category annotation, which is included in the annotation information of fully-supervised object detection. Therefore, how to adopt attention information in CAMs to improve the performance of fully-supervised object detection is an interesting problem. In this paper, we propose CAM R-CNN to deal with object detection, in which the category-aware attention maps provided by CAMs are integrated into the process of object detection. CAM R-CNN follows the common pipeline of the recent query-based object detectors in an end-to-end fashion, while two key CAM modules are embedded into the process. Specifically, E-CAM module provides embedding-level attention via fusing proposal features and attention information in CAMs with a transformer encoder, and S-CAM module supplies spatial-level attention by multiplying feature maps with the top-activated attention map provided by CAMs. In our experiments, CAM R-CNN demonstrates its superiority compared to several strong baselines on the challenging COCO dataset. Furthermore, we show that S-CAM module can be applied to two-stage detectors such as Faster R-CNN and Cascade R-CNN with consistent gains.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Zhang X, Wei Y, Feng J, Yang Y, Huang, T (2018) Adversarial complementary learning for weakly supervised object localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1325–1334 Zhang X, Wei Y, Feng J, Yang Y, Huang, T (2018) Adversarial complementary learning for weakly supervised object localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1325–1334
2.
go back to reference Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2921–2929 Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2921–2929
3.
go back to reference Pang Y, Xie J, Khan MH, Anwer RM, Khan FS, Shao L (2019) Mask-guided attention network for occluded pedestrian detection. In: International conference on computer vision, pp 4967–4975 Pang Y, Xie J, Khan MH, Anwer RM, Khan FS, Shao L (2019) Mask-guided attention network for occluded pedestrian detection. In: International conference on computer vision, pp 4967–4975
4.
go back to reference Zhang S, Yang J, Schiele B (2018) Occluded pedestrian detection through guided attention in cnns. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6995–7003 Zhang S, Yang J, Schiele B (2018) Occluded pedestrian detection through guided attention in cnns. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6995–7003
5.
go back to reference Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision, pp 213–229 Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision, pp 213–229
6.
go back to reference Sun P, Zhang R, Jiang Y, Kong T, Xu C, Zhan W, Tomizuka M, Li L, Yuan Z, Wang C, Luo P (2021) Sparse r-cnn: end-to-end object detection with learnable proposals. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 14454–14463 Sun P, Zhang R, Jiang Y, Kong T, Xu C, Zhan W, Tomizuka M, Li L, Yuan Z, Wang C, Luo P (2021) Sparse r-cnn: end-to-end object detection with learnable proposals. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 14454–14463
7.
go back to reference Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Llion Jones ANG, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Llion Jones ANG, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
8.
go back to reference Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. In: European conference on computer vision, pp 740–755 Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. In: European conference on computer vision, pp 740–755
9.
go back to reference Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99 Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
10.
go back to reference Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162 Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162
11.
go back to reference Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448 Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
12.
go back to reference Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587 Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
13.
go back to reference Uijlings JR, Sande KEVD, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171CrossRef Uijlings JR, Sande KEVD, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171CrossRef
14.
go back to reference Bodla N, Singh B, Chellappa R, Davis LS (2017) Soft-nms–improving object detection with one line of code. In: Proceedings of the IEEE international conference on computer vision, pp 5561–5569 Bodla N, Singh B, Chellappa R, Davis LS (2017) Soft-nms–improving object detection with one line of code. In: Proceedings of the IEEE international conference on computer vision, pp 5561–5569
15.
go back to reference Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271 Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
16.
go back to reference Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg, AC (2016) SSD: single shot multibox detector. In: European conference on computer vision, pp 21–37 Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg, AC (2016) SSD: single shot multibox detector. In: European conference on computer vision, pp 21–37
17.
go back to reference Lin TY, Goyal P, Girshick R, He K, Dollár, P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988 Lin TY, Goyal P, Girshick R, He K, Dollár, P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
18.
go back to reference Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision, pp 734–750 Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision, pp 734–750
19.
go back to reference Tian Z, Shen C, Chen H, He T (2019) FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE international conference on computer vision, pp 9627–9636 Tian Z, Shen C, Chen H, He T (2019) FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE international conference on computer vision, pp 9627–9636
21.
go back to reference Zheng M, Gao P, Wang X, Li H, Dong H (2020) End-to-end object detection with adaptive clustering transformer. In: CoRR, Abs/2011.09315 Zheng M, Gao P, Wang X, Li H, Dong H (2020) End-to-end object detection with adaptive clustering transformer. In: CoRR, Abs/2011.09315
22.
go back to reference Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2020) Deformable detr: deformable transformers for end-to-end object detection. In: CoRR, Abs/2010.04159 Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2020) Deformable detr: deformable transformers for end-to-end object detection. In: CoRR, Abs/2010.04159
23.
go back to reference Sun Z, Cao S, Yang Y, Kitani K (2020) Rethinking transformer-based set prediction for object detection. In: CoRR, Abs/2011.10881, pp 3611–3620 Sun Z, Cao S, Yang Y, Kitani K (2020) Rethinking transformer-based set prediction for object detection. In: CoRR, Abs/2011.10881, pp 3611–3620
24.
go back to reference Gao P, Zheng M, Wang X, Dai J, Li H (2021) Fast convergence of detr with spatially modulated co-attention. In: CoRR, Abs/2101.07448, pp 3621–3630 Gao P, Zheng M, Wang X, Dai J, Li H (2021) Fast convergence of detr with spatially modulated co-attention. In: CoRR, Abs/2101.07448, pp 3621–3630
25.
go back to reference Hu J, Cao L, Lu Y, Zhang S, Wang Y, Li K, Huang F, Shao L, Ji R (2021) ISTR: end-to-end instance segmentation with transformers. In: arXiv Preprint arXiv:2105.00637 Hu J, Cao L, Lu Y, Zhang S, Wang Y, Li K, Huang F, Shao L, Ji R (2021) ISTR: end-to-end instance segmentation with transformers. In: arXiv Preprint arXiv:​2105.​00637
26.
go back to reference Hong Q, Liu F, Li D, Liu J, Tian L, Shan Y (2022) Dynamic sparse r-cnn. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4723–4732 Hong Q, Liu F, Li D, Liu J, Tian L, Shan Y (2022) Dynamic sparse r-cnn. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4723–4732
27.
go back to reference Chen S, Sun P, Song Y, Luo P (2022) Diffusiondet: Diffusion model for object detection. arXiv, 2211.09788 Chen S, Sun P, Song Y, Luo P (2022) Diffusiondet: Diffusion model for object detection. arXiv, 2211.09788
28.
go back to reference Bell S, Zitnick CL, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883 Bell S, Zitnick CL, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883
29.
go back to reference Li R, Mai Z, Trabelsi C, Zhang Z, Jang J, Sanner S (2023) Transcam: transformer attention-based cam refinement for weakly supervised semantic segmentation. J Vis Commun Image Represent 92:103800CrossRef Li R, Mai Z, Trabelsi C, Zhang Z, Jang J, Sanner S (2023) Transcam: transformer attention-based cam refinement for weakly supervised semantic segmentation. J Vis Commun Image Represent 92:103800CrossRef
30.
go back to reference Zhang X, Ma J, Liu H, Hu HM, Yang P (2022) Dual attentional siamese network for visual tracking. Displays 74:102205CrossRef Zhang X, Ma J, Liu H, Hu HM, Yang P (2022) Dual attentional siamese network for visual tracking. Displays 74:102205CrossRef
31.
go back to reference Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: Proceedings of European conference on computer vision, pp 483–499 Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: Proceedings of European conference on computer vision, pp 483–499
32.
go back to reference Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chua TS (2017) SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6298–6306 Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chua TS (2017) SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6298–6306
33.
go back to reference Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6450–6458 Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6450–6458
34.
go back to reference Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. IEEE Transactions on pattern analysis and machine intelligence, 2011–2023 Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. IEEE Transactions on pattern analysis and machine intelligence, 2011–2023
35.
go back to reference Park J, Woo S, Lee JY, Kweon IS (2018) BAM: bottleneck attention module. In: Proceedings of the British machine vision conference, pp 1–14 Park J, Woo S, Lee JY, Kweon IS (2018) BAM: bottleneck attention module. In: Proceedings of the British machine vision conference, pp 1–14
36.
go back to reference Woo Park J, Lee JY, Kweon IS (2018) CBAM: convolutional block attention module. In: Proceedings of European conference on computer vision, pp 3–19 Woo Park J, Lee JY, Kweon IS (2018) CBAM: convolutional block attention module. In: Proceedings of European conference on computer vision, pp 3–19
37.
go back to reference He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
38.
go back to reference Lin TY, Dollár P, Ross Girshick KH, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125 Lin TY, Dollár P, Ross Girshick KH, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
40.
go back to reference Lee H, Kim HE, Nam H (2019) SRM: A style-based recalibration module for convolutional neural networks. In: Proceedings of IEEE/CVF international conference on computer vision, pp 1854–1862 Lee H, Kim HE, Nam H (2019) SRM: A style-based recalibration module for convolutional neural networks. In: Proceedings of IEEE/CVF international conference on computer vision, pp 1854–1862
Metadata
Title
CAM R-CNN: End-to-End Object Detection with Class Activation Maps
Authors
Shengchuan Zhang
Songlin Yu
Haixin Ding
Jie Hu
Liujuan Cao
Publication date
05-07-2023
Publisher
Springer US
Published in
Neural Processing Letters / Issue 8/2023
Print ISSN: 1370-4621
Electronic ISSN: 1573-773X
DOI
https://doi.org/10.1007/s11063-023-11335-9

Other articles of this Issue 8/2023

Neural Processing Letters 8/2023 Go to the issue