Skip to main content
Top
Published in:

2025 | OriginalPaper | Chapter

MutDet: Mutually Optimizing Pre-training for Remote Sensing Object Detection

Authors : Ziyue Huang, Yongchao Feng, Qingjie Liu, Yunhong Wang

Published in: Computer Vision – ECCV 2024

Publisher: Springer Nature Switzerland

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Detection pre-training methods for the DETR series detector have been extensively studied in natural scenes, e.g., DETReg. However, the detection pre-training remains unexplored in remote sensing scenes. In existing pre-training methods, alignment between object embeddings extracted from a pre-trained backbone and detector features is significant. However, due to differences in feature extraction methods, a pronounced feature discrepancy still exists and hinders the pre-training performance. The remote sensing images with complex environments and more densely distributed objects exacerbate the discrepancy. In this work, we propose a novel Mutually optimizing pre-training framework for remote sensing object Detection, dubbed as MutDet. In MutDet, we propose a systemic solution against this challenge. Firstly, we propose a mutual enhancement module, which fuses the object embeddings and detector features bidirectionally in the last encoder layer, enhancing their information interaction. Secondly, contrastive alignment loss is employed to guide this alignment process softly and simultaneously enhances detector features’ discriminativity. Finally, we design an auxiliary siamese head to mitigate the task gap arising from the introduction of enhancement module. Comprehensive experiments on various settings show new state-of-the-art transfer performance. The improvement is particularly pronounced when data quantity is limited. When using 10 % of the DIOR-R data, MutDet improves DetReg by 6.1% in AP\(_{50}\). Codes and models are available at: https://​github.​com/​floatingstarZ/​MutDet.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
1.
go back to reference Bar, A., et al.: DETReg: unsupervised pretraining with region priors for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14605–14615 (2022) Bar, A., et al.: DETReg: unsupervised pretraining with region priors for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14605–14615 (2022)
2.
go back to reference Bouniot, Q., Audigier, R., Loesch, A., Habrard, A.: Proposal-contrastive pretraining for object detection from fewer data. In: International Conference on Learning Representations (2023) Bouniot, Q., Audigier, R., Loesch, A., Habrard, A.: Proposal-contrastive pretraining for object detection from fewer data. In: International Conference on Learning Representations (2023)
4.
go back to reference Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsupervised learning of visual features by contrasting cluster assignments. Adv. Neural. Inf. Process. Syst. 33, 9912–9924 (2020) Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsupervised learning of visual features by contrasting cluster assignments. Adv. Neural. Inf. Process. Syst. 33, 9912–9924 (2020)
5.
go back to reference Chang, J., Wang, S., Xu, H.M., Chen, Z., Yang, C., Zhao, F.: DETRDistill: a universal knowledge distillation framework for DETR-families. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6898–6908 (2023) Chang, J., Wang, S., Xu, H.M., Chen, Z., Yang, C., Zhao, F.: DETRDistill: a universal knowledge distillation framework for DETR-families. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6898–6908 (2023)
6.
go back to reference Chen, D., Mei, J.P., Zhang, H., Wang, C., Feng, Y., Chen, C.: Knowledge distillation with the reused teacher classifier. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11933–11942 (2022) Chen, D., Mei, J.P., Zhang, H., Wang, C., Feng, Y., Chen, C.: Knowledge distillation with the reused teacher classifier. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11933–11942 (2022)
7.
go back to reference Chen, Q., et al.: Group DETR: fast DETR training with group-wise one-to-many assignment. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6633–6642 (2023) Chen, Q., et al.: Group DETR: fast DETR training with group-wise one-to-many assignment. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6633–6642 (2023)
8.
go back to reference Chen, X., Xie, S., He, K.: An empirical study of training self-supervised vision transformers. In: 2021 IEEE/CVF International Conference on Computer Vision, pp. 9620–9629 (2021) Chen, X., Xie, S., He, K.: An empirical study of training self-supervised vision transformers. In: 2021 IEEE/CVF International Conference on Computer Vision, pp. 9620–9629 (2021)
9.
go back to reference Cheng, G., et al.: Anchor-free oriented proposal generator for object detection. IEEE Trans. Geosci. Remote Sens. 60, 1–11 (2022) Cheng, G., et al.: Anchor-free oriented proposal generator for object detection. IEEE Trans. Geosci. Remote Sens. 60, 1–11 (2022)
10.
go back to reference Dai, L., Liu, H., Tang, H., Wu, Z., Song, P.: AO2-DETR: arbitrary-oriented object detection transformer. IEEE Trans. Circuits Syst. Video Technol. 33(5), 2342–2356 (2023)CrossRef Dai, L., Liu, H., Tang, H., Wu, Z., Song, P.: AO2-DETR: arbitrary-oriented object detection transformer. IEEE Trans. Circuits Syst. Video Technol. 33(5), 2342–2356 (2023)CrossRef
11.
go back to reference Dai, Z., Cai, B., Lin, Y., Chen, J.: UP-DETR: unsupervised pre-training for object detection with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1601–1610 (2021) Dai, Z., Cai, B., Lin, Y., Chen, J.: UP-DETR: unsupervised pre-training for object detection with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1601–1610 (2021)
12.
go back to reference Ding, J., Xue, N., Long, Y., Xia, G.S., Lu, Q.: Learning ROI transformer for oriented object detection in aerial images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2849–2858 (2019) Ding, J., Xue, N., Long, Y., Xia, G.S., Lu, Q.: Learning ROI transformer for oriented object detection in aerial images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2849–2858 (2019)
13.
go back to reference Grill, J.B., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Adv. Neural. Inf. Process. Syst. 33, 21271–21284 (2020) Grill, J.B., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Adv. Neural. Inf. Process. Syst. 33, 21271–21284 (2020)
14.
go back to reference Han, J., Ding, J., Xue, N., Xia, G.S.: ReDet: a rotation-equivariant detector for aerial object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2786–2795 (2021) Han, J., Ding, J., Xue, N., Xia, G.S.: ReDet: a rotation-equivariant detector for aerial object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2786–2795 (2021)
15.
go back to reference Hu, Z., et al.: EMO2-DETR: efficient-matching oriented object detection with transformers. IEEE Trans. Geosci. Remote Sens. (2023) Hu, Z., et al.: EMO2-DETR: efficient-matching oriented object detection with transformers. IEEE Trans. Geosci. Remote Sens. (2023)
16.
go back to reference Huang, G., Laradji, I., Vazquez, D., Lacoste-Julien, S., Rodriguez, P.: A survey of self-supervised and few-shot object detection. IEEE Trans. Pattern Anal. Mach. Intell. 45(4), 4071–4089 (2022) Huang, G., Laradji, I., Vazquez, D., Lacoste-Julien, S., Rodriguez, P.: A survey of self-supervised and few-shot object detection. IEEE Trans. Pattern Anal. Mach. Intell. 45(4), 4071–4089 (2022)
17.
go back to reference Huang, G., et al.: Siamese DETR. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15722–15731 (2023) Huang, G., et al.: Siamese DETR. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15722–15731 (2023)
18.
go back to reference Kirillov, A., et al.: Segment anything. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4015–4026 (2023) Kirillov, A., et al.: Segment anything. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4015–4026 (2023)
19.
go back to reference Lee, H., Song, M., Koo, J., Seo, J.: RHINO: rotated DETR with dynamic denoising via Hungarian matching for oriented object detection. arXiv preprint arXiv:2305.07598 (2023) Lee, H., Song, M., Koo, J., Seo, J.: RHINO: rotated DETR with dynamic denoising via Hungarian matching for oriented object detection. arXiv preprint arXiv:​2305.​07598 (2023)
20.
go back to reference Li, K., Wan, G., Cheng, G., Meng, L., Han, J.: Object detection in optical remote sensing images: a survey and a new benchmark. ISPRS J. Photogramm. Remote. Sens. 159, 296–307 (2020)CrossRef Li, K., Wan, G., Cheng, G., Meng, L., Han, J.: Object detection in optical remote sensing images: a survey and a new benchmark. ISPRS J. Photogramm. Remote. Sens. 159, 296–307 (2020)CrossRef
21.
go back to reference Li, M., et al.: AlignDet: aligning pre-training and fine-tuning in object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6866–6876 (2023) Li, M., et al.: AlignDet: aligning pre-training and fine-tuning in object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6866–6876 (2023)
22.
go back to reference Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017) Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
23.
go back to reference Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
24.
go back to reference Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (2019) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (2019)
26.
go back to reference Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 658–666 (2019) Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 658–666 (2019)
27.
go back to reference Sun, X., et al.: FAIR1M: a benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery. ISPRS J. Photogramm. Remote. Sens. 184, 116–130 (2022)CrossRef Sun, X., et al.: FAIR1M: a benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery. ISPRS J. Photogramm. Remote. Sens. 184, 116–130 (2022)CrossRef
28.
go back to reference Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vision 104, 154–171 (2013)CrossRef Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vision 104, 154–171 (2013)CrossRef
29.
go back to reference Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017) Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
30.
go back to reference Wang, J., Chen, Y., Zheng, Z., Li, X., Cheng, M.M., Hou, Q.: CrossKD: cross-head knowledge distillation for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16520–16530 (2024) Wang, J., Chen, Y., Zheng, Z., Li, X., Cheng, M.M., Hou, Q.: CrossKD: cross-head knowledge distillation for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16520–16530 (2024)
31.
go back to reference Wang, Y., et al.: Revisiting the transferability of supervised pretraining: an MLP perspective. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9183–9193 (2022) Wang, Y., et al.: Revisiting the transferability of supervised pretraining: an MLP perspective. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9183–9193 (2022)
32.
go back to reference Wei, F., Gao, Y., Wu, Z., Hu, H., Lin, S.: Aligning pretraining for detection via object-level contrastive learning. Adv. Neural. Inf. Process. Syst. 34, 22682–22694 (2021) Wei, F., Gao, Y., Wu, Z., Hu, H., Lin, S.: Aligning pretraining for detection via object-level contrastive learning. Adv. Neural. Inf. Process. Syst. 34, 22682–22694 (2021)
33.
go back to reference Xia, G.S., et al.: DOTA: a large-scale dataset for object detection in aerial images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3974–3983 (2018) Xia, G.S., et al.: DOTA: a large-scale dataset for object detection in aerial images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3974–3983 (2018)
34.
go back to reference Xie, X., Cheng, G., Wang, J., Yao, X., Han, J.: Oriented R-CNN for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3520–3529 (2021) Xie, X., Cheng, G., Wang, J., Yao, X., Han, J.: Oriented R-CNN for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3520–3529 (2021)
36.
go back to reference Yang, C., An, Z., Cai, L., Xu, Y.: Mutual contrastive learning for visual representation learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 3045–3053 (2022) Yang, C., An, Z., Cai, L., Xu, Y.: Mutual contrastive learning for visual representation learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 3045–3053 (2022)
37.
go back to reference Yang, X., Yan, J.: On the arbitrary-oriented object detection: classification based approaches revisited. Int. J. Comput. Vision 130(5), 1340–1365 (2022)CrossRef Yang, X., Yan, J.: On the arbitrary-oriented object detection: classification based approaches revisited. Int. J. Comput. Vision 130(5), 1340–1365 (2022)CrossRef
38.
go back to reference Yang, X., et al.: Detecting rotated objects as gaussian distributions and its 3-D generalization. IEEE Trans. Pattern Anal. Mach. Intell. 45(4), 4335–4354 (2023) Yang, X., et al.: Detecting rotated objects as gaussian distributions and its 3-D generalization. IEEE Trans. Pattern Anal. Mach. Intell. 45(4), 4335–4354 (2023)
40.
go back to reference Yu, Y., Da, F.: Phase-shifting coder: predicting accurate orientation in oriented object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13354–13363 (2023) Yu, Y., Da, F.: Phase-shifting coder: predicting accurate orientation in oriented object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13354–13363 (2023)
41.
go back to reference Zeng, Y., Chen, Y., Yang, X., Li, Q., Yan, J.: ARS-DETR: aspect ratio-sensitive detection transformer for aerial oriented object detection. IEEE Trans. Geosci. Remote Sens. 62, 1–15 (2024) Zeng, Y., Chen, Y., Yang, X., Li, Q., Yan, J.: ARS-DETR: aspect ratio-sensitive detection transformer for aerial oriented object detection. IEEE Trans. Geosci. Remote Sens. 62, 1–15 (2024)
42.
go back to reference Zhang, H., et al.: DINO: DETR with improved denoising anchor boxes for end-to-end object detection. In: The Eleventh International Conference on Learning Representations (2023) Zhang, H., et al.: DINO: DETR with improved denoising anchor boxes for end-to-end object detection. In: The Eleventh International Conference on Learning Representations (2023)
43.
go back to reference Zhang, Y., Yuan, Y., Feng, Y., Lu, X.: Hierarchical and robust convolutional neural network for very high-resolution remote sensing object detection. IEEE Trans. Geosci. Remote Sens. 57(8), 5535–5548 (2019)CrossRef Zhang, Y., Yuan, Y., Feng, Y., Lu, X.: Hierarchical and robust convolutional neural network for very high-resolution remote sensing object detection. IEEE Trans. Geosci. Remote Sens. 57(8), 5535–5548 (2019)CrossRef
44.
go back to reference Zhao, Z., Li, S.: ABFL: angular boundary discontinuity free loss for arbitrary oriented object detection in aerial images. IEEE Trans. Geosci. Remote Sens. (2024) Zhao, Z., Li, S.: ABFL: angular boundary discontinuity free loss for arbitrary oriented object detection in aerial images. IEEE Trans. Geosci. Remote Sens. (2024)
45.
go back to reference Zheng, Z., et al.: Localization distillation for dense object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9407–9416 (2022) Zheng, Z., et al.: Localization distillation for dense object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9407–9416 (2022)
46.
go back to reference Zhou, Y., et al.: MMRotate: a rotated object detection benchmark using pytorch. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 7331–7334 (2022) Zhou, Y., et al.: MMRotate: a rotated object detection benchmark using pytorch. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 7331–7334 (2022)
47.
go back to reference Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. In: International Conference on Learning Representations (2021) Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. In: International Conference on Learning Representations (2021)
Metadata
Title
MutDet: Mutually Optimizing Pre-training for Remote Sensing Object Detection
Authors
Ziyue Huang
Yongchao Feng
Qingjie Liu
Yunhong Wang
Copyright Year
2025
DOI
https://doi.org/10.1007/978-3-031-73254-6_1

Premium Partner