Skip to main content

2024 | OriginalPaper | Buchkapitel

Efficient Multi-object Detection for Complexity Spatio-Temporal Scenes

verfasst von : Kai Wang, Xiangyu Song, Shijie Sun, Juan Zhao, Cai Xu, Huansheng Song

Erschienen in: Web and Big Data

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Multi-Object detection in traffic scenarios plays a crucial role in ensuring the safety of people and property, as well as facilitating the smooth flow of traffic on roads. However, the existing algorithms are inefficient in detecting real scenarios due to the following drawbacks: (1) a scarcity of traffic scene datasets; (2) a lack of tailoring for specific scenarios; and (3) high computational complexity, which hinders practical use. In this paper, we propose a solution to eliminate these drawbacks. Specifically, we introduce a Full-Scene Traffic Dataset (FSTD) with Spatio-temporal features that includes multiple views, multiple scenes, and multiple objectives. Additionally, we propose the improved YOLOv7 model with redesigned BiFusion, NWD and SPPFCSPC modules (BNF-YOLOv7), which is a lightweight and efficient approach that addresses the intricacies of multi-object detection in traffic scenarios. BNF-YOLOv7 is achieved through several improvements over YOLOv7, including the use of the BiFusion feature fusion module, the NWD approach, and the redesign of the loss function. First, we improve the SPPCSPC structure to obtain SPPFCSPC, which maintains the same receptive field while achieving speedup. Second, we use the BiFusion feature fusion module to enhance feature representation capability and improve positional information of objects. Additionally, we introduce NWD and redesign the loss function to address the detection of tiny objects in traffic scenarios. Experiments on the FSTD and UA-DETRAC dataset show that BNF-YOLOv7 outperforms other algorithms with a 3.3% increase in mAP on FSTD and a 2.4% increase on UA-DETRAC. Additionally, BNF-YOLOv7 maintains significantly better real-time performance, increasing the FPS by 10% in real scenarios.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv:2004.10934 [cs, eess] (2020) Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv:​2004.​10934 [cs, eess] (2020)
2.
Zurück zum Zitat Cai, Z., Vasconcelos, N.: Cascade R-CNN: high quality object detection and instance segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1483–1498 (2019)CrossRef Cai, Z., Vasconcelos, N.: Cascade R-CNN: high quality object detection and instance segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1483–1498 (2019)CrossRef
3.
Zurück zum Zitat Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016) Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
4.
Zurück zum Zitat Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6569–6578 (2019) Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6569–6578 (2019)
5.
Zurück zum Zitat Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 107, 3–11 (2018)CrossRef Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 107, 3–11 (2018)CrossRef
6.
Zurück zum Zitat Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The kitti vision benchmark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012) Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The kitti vision benchmark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)
7.
Zurück zum Zitat Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
8.
Zurück zum Zitat Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
9.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)CrossRef He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)CrossRef
10.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
11.
Zurück zum Zitat Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015) Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)
16.
Zurück zum Zitat Liang, M., Yang, B., Chen, Y., Hu, R., Urtasun, R.: Multi-task multi-sensor fusion for 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7345–7353 (2019) Liang, M., Yang, B., Chen, Y., Hu, R., Urtasun, R.: Multi-task multi-sensor fusion for 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7345–7353 (2019)
18.
Zurück zum Zitat Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018) Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018)
20.
Zurück zum Zitat Marriott, R.T., Romdhani, S., Chen, L.: A 3D GAN for improved large-pose facial recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13445–13455 (2021) Marriott, R.T., Romdhani, S., Chen, L.: A 3D GAN for improved large-pose facial recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13445–13455 (2021)
21.
Zurück zum Zitat Qin, L., et al.: Id-yolo: real-time salient object detection based on the driver’s fixation region. IEEE Trans. Intell. Transp. Syst. 23(9), 15898–15908 (2022)CrossRef Qin, L., et al.: Id-yolo: real-time salient object detection based on the driver’s fixation region. IEEE Trans. Intell. Transp. Syst. 23(9), 15898–15908 (2022)CrossRef
22.
Zurück zum Zitat Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
23.
Zurück zum Zitat Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017) Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
25.
Zurück zum Zitat Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. syst. 28 (2015) Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. syst. 28 (2015)
26.
Zurück zum Zitat Song, X., et al.: A survey on deep learning based knowledge tracing. Knowl.-Based Syst. 258, 110036 (2022)CrossRef Song, X., et al.: A survey on deep learning based knowledge tracing. Knowl.-Based Syst. 258, 110036 (2022)CrossRef
27.
Zurück zum Zitat Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015) Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
28.
Zurück zum Zitat Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696 (2022) Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:​2207.​02696 (2022)
29.
Zurück zum Zitat Wang, C.Y., Liao, H.Y.M., Wu, Y.H., Chen, P.Y., Hsieh, J.W., Yeh, I.H.: CSPNet: a new backbone that can enhance learning capability of CNN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 390–391 (2020) Wang, C.Y., Liao, H.Y.M., Wu, Y.H., Chen, P.Y., Hsieh, J.W., Yeh, I.H.: CSPNet: a new backbone that can enhance learning capability of CNN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 390–391 (2020)
30.
Zurück zum Zitat Wang, F., Xu, J., Liu, C., Zhou, R., Zhao, P.: On prediction of traffic flows in smart cities: a multitask deep learning based approach. World Wide Web 24, 805–823 (2021)CrossRef Wang, F., Xu, J., Liu, C., Zhou, R., Zhao, P.: On prediction of traffic flows in smart cities: a multitask deep learning based approach. World Wide Web 24, 805–823 (2021)CrossRef
31.
Zurück zum Zitat Wang, J., Xu, C., Yang, W., Yu, L.: A normalized gaussian wasserstein distance for tiny object detection. arXiv preprint arXiv:2110.13389 (2021) Wang, J., Xu, C., Yang, W., Yu, L.: A normalized gaussian wasserstein distance for tiny object detection. arXiv preprint arXiv:​2110.​13389 (2021)
32.
Zurück zum Zitat Wang, L., et al.: Model: motif-based deep feature learning for link prediction. IEEE Trans. Comput. Soc. Syst. 7(2), 503–516 (2020)CrossRef Wang, L., et al.: Model: motif-based deep feature learning for link prediction. IEEE Trans. Comput. Soc. Syst. 7(2), 503–516 (2020)CrossRef
33.
Zurück zum Zitat Wen, L., et al.: UA-DETRAC: a new benchmark and protocol for multi-object detection and tracking. Comput. Vis. Image Underst, 193, 102907 (2020) Wen, L., et al.: UA-DETRAC: a new benchmark and protocol for multi-object detection and tracking. Comput. Vis. Image Underst, 193, 102907 (2020)
34.
Zurück zum Zitat Xu, C., et al.: Uncertainty-aware multi-view deep learning for internet of things applications. IEEE Trans. Industr. Inf. 19(2), 1456–1466 (2022)CrossRef Xu, C., et al.: Uncertainty-aware multi-view deep learning for internet of things applications. IEEE Trans. Industr. Inf. 19(2), 1456–1466 (2022)CrossRef
35.
Zurück zum Zitat Yang, X., Yan, J., Ming, Q., Wang, W., Zhang, X., Tian, Q.: Rethinking rotated object detection with gaussian wasserstein distance loss. In: International Conference on Machine Learning, pp. 11830–11841. PMLR (2021) Yang, X., Yan, J., Ming, Q., Wang, W., Zhang, X., Tian, Q.: Rethinking rotated object detection with gaussian wasserstein distance loss. In: International Conference on Machine Learning, pp. 11830–11841. PMLR (2021)
36.
Zurück zum Zitat Yin, H., Yang, S., Song, X., Liu, W., Li, J.: Deep fusion of multimodal features for social media retweet time prediction. World Wide Web 24, 1027–1044 (2021)CrossRef Yin, H., Yang, S., Song, X., Liu, W., Li, J.: Deep fusion of multimodal features for social media retweet time prediction. World Wide Web 24, 1027–1044 (2021)CrossRef
37.
Zurück zum Zitat Yu, F., et al.: Bdd100k: a diverse driving dataset for heterogeneous multitask learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2636–2645 (2020) Yu, F., et al.: Bdd100k: a diverse driving dataset for heterogeneous multitask learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2636–2645 (2020)
38.
Zurück zum Zitat Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: Unitbox: an advanced object detection network. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 516–520 (2016) Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: Unitbox: an advanced object detection network. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 516–520 (2016)
40.
Zurück zum Zitat Zheng, Z., et al.: Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Trans. Cybern. 52(8), 8574–8586 (2021)CrossRef Zheng, Z., et al.: Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Trans. Cybern. 52(8), 8574–8586 (2021)CrossRef
Metadaten
Titel
Efficient Multi-object Detection for Complexity Spatio-Temporal Scenes
verfasst von
Kai Wang
Xiangyu Song
Shijie Sun
Juan Zhao
Cai Xu
Huansheng Song
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-97-2421-5_13

Premium Partner