Skip to main content

2024 | OriginalPaper | Buchkapitel

3D Small Object Detection from Cameras and Point Clouds Using Five-Head Attention in a Fusion Method

verfasst von : Haogang Mao, Jichao Jiao, Jialun Li, Yang Fuxing

Erschienen in: China Satellite Navigation Conference (CSNC 2024) Proceedings

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we focus on 3D point clouds, multimodal data fusion, and attention mechanisms. Through a survey of related research on 3D object detection based on multimodal data fusion, we identified three problems: (1) the detection accuracy of small objects, such as pedestrians and bicycles, is unsatisfactory; (2) the fusion training of two different models cannot match the efficiency of single model training; and (3) when there are long-range objects in the pseudo-image generated by the features, existing methods cannot maintain the original high accuracy, and the generalization ability of the model is weak. To solve these problems and improve the detection performance of single-modal based detectors, this paper introduces a new fusion network that mainly consists of a five-head attention module and a posterior decision fusion (CPFN) module. The five-head module suppresses noise interference by jointly considering channel, spatial, point, and voxel attention, while enhancing the understanding of key information about the object. Additionally, CPFNet uses a PointPillars network with an attention mechanism for decision fusion with a CascadeR-CNN network. Experimental results on the validation set of the KITTI dataset show that our proposed method far outperforms existing methods in the small sample category, whether compared to state-of-the-art fusion-based methods or point cloud neural networks.hods or point cloud neural networks.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Lang AH, Vora S, Caesar H, et al. (2019) Pointpillars: Fast encoders for object detection from point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12697–12705 Lang AH, Vora S, Caesar H, et al. (2019) Pointpillars: Fast encoders for object detection from point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12697–12705
2.
Zurück zum Zitat Pang S, Morris D, Radha H (2020) CLOCs: Camera-LiDAR object candidates fusion for 3D object detection. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp 10386–10393 Pang S, Morris D, Radha H (2020) CLOCs: Camera-LiDAR object candidates fusion for 3D object detection. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp 10386–10393
3.
Zurück zum Zitat Ren S, He K, Girshick R, et al. (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inform Process Syst 28 Ren S, He K, Girshick R, et al. (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inform Process Syst 28
4.
Zurück zum Zitat Calandra R, Seyfarth A, Peters J et al (2016) Bayesian optimization for learning gaits under uncertainty. Ann Math Artif Intell 76(1):5–23MathSciNetCrossRef Calandra R, Seyfarth A, Peters J et al (2016) Bayesian optimization for learning gaits under uncertainty. Ann Math Artif Intell 76(1):5–23MathSciNetCrossRef
5.
Zurück zum Zitat Lin TY, Dollár P, Girshick R, et al. (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125 Lin TY, Dollár P, Girshick R, et al. (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
6.
Zurück zum Zitat Lin TY, Goyal P, Girshick R, et al. (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988 Lin TY, Goyal P, Girshick R, et al. (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
7.
Zurück zum Zitat Zhou Y, Tuzel O (2018) Voxelnet: End-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4490–4499 Zhou Y, Tuzel O (2018) Voxelnet: End-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4490–4499
8.
Zurück zum Zitat Yan Y, Mao Y, Li B (2018) Second: sparsely embedded convolutional detection. Sensors 18(10):3337CrossRef Yan Y, Mao Y, Li B (2018) Second: sparsely embedded convolutional detection. Sensors 18(10):3337CrossRef
9.
Zurück zum Zitat Qi CR, Su H, Mo K, et al. (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 652–660 Qi CR, Su H, Mo K, et al. (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 652–660
10.
Zurück zum Zitat Shi S, Wang X, Li H (2019) Pointrcnn: 3d object proposal generation and detection from point cloud. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 770–779 Shi S, Wang X, Li H (2019) Pointrcnn: 3d object proposal generation and detection from point cloud. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 770–779
11.
Zurück zum Zitat Song S, Chandraker M (2015) Joint sfm and detection cues for monocular 3d localization in road scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3734–3742 Song S, Chandraker M (2015) Joint sfm and detection cues for monocular 3d localization in road scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3734–3742
12.
Zurück zum Zitat Chen X, Kundu K, Zhang Z, et al. (2016) Monocular 3d object detection for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2147–2156 Chen X, Kundu K, Zhang Z, et al. (2016) Monocular 3d object detection for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2147–2156
13.
Zurück zum Zitat Mousavian A, Anguelov D, Flynn J, et al. (2017) 3d bounding box estimation using deep learning and geometry. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 7074–7082 Mousavian A, Anguelov D, Flynn J, et al. (2017) 3d bounding box estimation using deep learning and geometry. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 7074–7082
14.
Zurück zum Zitat Chen X, Ma H, Wan J, et al. (2017) Multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 1907–1915 Chen X, Ma H, Wan J, et al. (2017) Multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 1907–1915
15.
Zurück zum Zitat Ku J, Mozifian M, Lee J, et al. (2018) Joint 3d proposal generation and object detection from view aggregation. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp 1–8 Ku J, Mozifian M, Lee J, et al. (2018) Joint 3d proposal generation and object detection from view aggregation. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp 1–8
16.
Zurück zum Zitat Yang B, Luo W, Urtasun R (2018) Pixor: Real-time 3d object detection from point clouds. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 7652–7660 Yang B, Luo W, Urtasun R (2018) Pixor: Real-time 3d object detection from point clouds. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 7652–7660
17.
Zurück zum Zitat Simony M, Milzy S, Amendey K, et al. (2018) Complex-yolo: An euler-region-proposal for real-time 3d object detection on point clouds. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops Simony M, Milzy S, Amendey K, et al. (2018) Complex-yolo: An euler-region-proposal for real-time 3d object detection on point clouds. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops
18.
19.
Zurück zum Zitat Yang B, Liang M, Urtasun R (2018) Hdnet: Exploiting hd maps for 3d object detection. In: Conference on Robot Learning. PMLR, pp 146–155 Yang B, Liang M, Urtasun R (2018) Hdnet: Exploiting hd maps for 3d object detection. In: Conference on Robot Learning. PMLR, pp 146–155
20.
Zurück zum Zitat Engelcke M, Rao D, Wang D Z, et al. (2017) Vote3deep: Fast object detection in 3d point clouds using efficient convolutional neural networks. In: 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp 1355–1361 Engelcke M, Rao D, Wang D Z, et al. (2017) Vote3deep: Fast object detection in 3d point clouds using efficient convolutional neural networks. In: 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp 1355–1361
21.
Zurück zum Zitat Chen Y, Liu S, Shen X, et al. (2019) Fast point r-cnn. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 9775–9784 Chen Y, Liu S, Shen X, et al. (2019) Fast point r-cnn. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 9775–9784
22.
Zurück zum Zitat Yang Z, Sun Y, Liu S, et al. (2019) Std: Sparse-to-dense 3d object detector for point cloud. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1951–1960 Yang Z, Sun Y, Liu S, et al. (2019) Std: Sparse-to-dense 3d object detector for point cloud. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1951–1960
23.
Zurück zum Zitat Shi S, Guo C, Jiang L, et al. (2020) Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10529–10538 Shi S, Guo C, Jiang L, et al. (2020) Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10529–10538
24.
Zurück zum Zitat Pang S, Morris D, Radha H (2022) Fast-CLOCs: fast camera-LiDAR object candidates fusion for 3d object detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 187–196 Pang S, Morris D, Radha H (2022) Fast-CLOCs: fast camera-LiDAR object candidates fusion for 3d object detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 187–196
25.
Zurück zum Zitat Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The kitti vision benchmark suite. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 3354–3361 Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The kitti vision benchmark suite. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 3354–3361
26.
Zurück zum Zitat Chen X, Kundu K, Zhu Y, et al. (2015) 3D object proposals for accurate object class detection. In: Advances in neural information processing systems, p 28 Chen X, Kundu K, Zhu Y, et al. (2015) 3D object proposals for accurate object class detection. In: Advances in neural information processing systems, p 28
27.
Zurück zum Zitat Cai Z, Vasconcelos N (2018) Cascade r-cnn: Delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162 Cai Z, Vasconcelos N (2018) Cascade r-cnn: Delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162
28.
Zurück zum Zitat He Q, Wang Z, Zeng H, et al. (2020) Svga-net: Sparse voxel-graph attention network for 3d object detection from point clouds. arXiv preprint arXiv:2006.04043 He Q, Wang Z, Zeng H, et al. (2020) Svga-net: Sparse voxel-graph attention network for 3d object detection from point clouds. arXiv preprint arXiv:​2006.​04043
Metadaten
Titel
3D Small Object Detection from Cameras and Point Clouds Using Five-Head Attention in a Fusion Method
verfasst von
Haogang Mao
Jichao Jiao
Jialun Li
Yang Fuxing
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-99-6944-9_39

Neuer Inhalt