nach oben

Erschienen in:

2024 | OriginalPaper | Buchkapitel

Efficient Multi-object Detection for Complexity Spatio-Temporal Scenes

verfasst von : Kai Wang, Xiangyu Song, Shijie Sun, Juan Zhao, Cai Xu, Huansheng Song

Erschienen in: Web and Big Data

Verlag: Springer Nature Singapore

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Multi-Object detection in traffic scenarios plays a crucial role in ensuring the safety of people and property, as well as facilitating the smooth flow of traffic on roads. However, the existing algorithms are inefficient in detecting real scenarios due to the following drawbacks: (1) a scarcity of traffic scene datasets; (2) a lack of tailoring for specific scenarios; and (3) high computational complexity, which hinders practical use. In this paper, we propose a solution to eliminate these drawbacks. Specifically, we introduce a Full-Scene Traffic Dataset (FSTD) with Spatio-temporal features that includes multiple views, multiple scenes, and multiple objectives. Additionally, we propose the improved YOLOv7 model with redesigned BiFusion, NWD and SPPFCSPC modules (BNF-YOLOv7), which is a lightweight and efficient approach that addresses the intricacies of multi-object detection in traffic scenarios. BNF-YOLOv7 is achieved through several improvements over YOLOv7, including the use of the BiFusion feature fusion module, the NWD approach, and the redesign of the loss function. First, we improve the SPPCSPC structure to obtain SPPFCSPC, which maintains the same receptive field while achieving speedup. Second, we use the BiFusion feature fusion module to enhance feature representation capability and improve positional information of objects. Additionally, we introduce NWD and redesign the loss function to address the detection of tiny objects in traffic scenarios. Experiments on the FSTD and UA-DETRAC dataset show that BNF-YOLOv7 outperforms other algorithms with a 3.3% increase in mAP on FSTD and a 2.4% increase on UA-DETRAC. Additionally, BNF-YOLOv7 maintains significantly better real-time performance, increasing the FPS by 10% in real scenarios.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Time-Aware Preference Recommendation Based on Behavior Sequence

Nächstes Kapitel PERTAD: Towards Pseudo Verification for Anomaly Detection in Partially Labeled Graphs

Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv:2004.10934 [cs, eess] (2020)

Cai, Z., Vasconcelos, N.: Cascade R-CNN: high quality object detection and instance segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1483–1498 (2019)CrossRef

Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6569–6578 (2019)

Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 107, 3–11 (2018)CrossRef

Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The kitti vision benchmark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)

Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)CrossRef

10.

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

11.

Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)

12.

Jocher, G.: YOLOv5 by Ultralytics (2020). https://doi.org/10.5281/zenodo.3908559, https://github.com/ultralytics/yolov5

13.

Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

14.

Kowol, K., Rottmann, M., Bracke, S., Gottschalk, H.: YOdar: uncertainty-based sensor fusion for vehicle detection with camera and radar sensors (2020). https://doi.org/10.48550/arXiv.2010.03320

15.

Li, C., et al.: YOLOv6 v3.0: a full-scale reloading (2023).https://doi.org/10.48550/arXiv.2301.05586

16.

Liang, M., Yang, B., Chen, Y., Hu, R., Urtasun, R.: Multi-task multi-sensor fusion for 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7345–7353 (2019)

17.

Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48CrossRef

18.

Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018)

19.

Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2CrossRef

20.

Marriott, R.T., Romdhani, S., Chen, L.: A 3D GAN for improved large-pose facial recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13445–13455 (2021)

21.

Qin, L., et al.: Id-yolo: real-time salient object detection based on the driver’s fixation region. IEEE Trans. Intell. Transp. Syst. 23(9), 15898–15908 (2022)CrossRef

22.

Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

23.

Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)

24.

Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

25.

Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. syst. 28 (2015)

26.

Song, X., et al.: A survey on deep learning based knowledge tracing. Knowl.-Based Syst. 258, 110036 (2022)CrossRef

27.

Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

28.

Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696 (2022)

29.

Wang, C.Y., Liao, H.Y.M., Wu, Y.H., Chen, P.Y., Hsieh, J.W., Yeh, I.H.: CSPNet: a new backbone that can enhance learning capability of CNN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 390–391 (2020)

30.

Wang, F., Xu, J., Liu, C., Zhou, R., Zhao, P.: On prediction of traffic flows in smart cities: a multitask deep learning based approach. World Wide Web 24, 805–823 (2021)CrossRef

31.

Wang, J., Xu, C., Yang, W., Yu, L.: A normalized gaussian wasserstein distance for tiny object detection. arXiv preprint arXiv:2110.13389 (2021)

32.

Wang, L., et al.: Model: motif-based deep feature learning for link prediction. IEEE Trans. Comput. Soc. Syst. 7(2), 503–516 (2020)CrossRef

33.

Wen, L., et al.: UA-DETRAC: a new benchmark and protocol for multi-object detection and tracking. Comput. Vis. Image Underst, 193, 102907 (2020)

34.

Xu, C., et al.: Uncertainty-aware multi-view deep learning for internet of things applications. IEEE Trans. Industr. Inf. 19(2), 1456–1466 (2022)CrossRef

35.

Yang, X., Yan, J., Ming, Q., Wang, W., Zhang, X., Tian, Q.: Rethinking rotated object detection with gaussian wasserstein distance loss. In: International Conference on Machine Learning, pp. 11830–11841. PMLR (2021)

36.

Yin, H., Yang, S., Song, X., Liu, W., Li, J.: Deep fusion of multimodal features for social media retweet time prediction. World Wide Web 24, 1027–1044 (2021)CrossRef

37.

Yu, F., et al.: Bdd100k: a diverse driving dataset for heterogeneous multitask learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2636–2645 (2020)

38.

Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: Unitbox: an advanced object detection network. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 516–520 (2016)

39.

Zhang, Wei, Gao, Xian-zhong, Yang, Chi-fu, Jiang, Feng, Chen, Zhi-yuan: A object detection and tracking method for security in intelligence of unmanned surface vehicles. J. Ambient Intell. Humanized Comput. 13(3), 1279–1291 (2020). https://doi.org/10.1007/s12652-020-02573-zCrossRef

40.

Zheng, Z., et al.: Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Trans. Cybern. 52(8), 8574–8586 (2021)CrossRef

Titel: Efficient Multi-object Detection for Complexity Spatio-Temporal Scenes
verfasst von: Kai Wang
Xiangyu Song
Shijie Sun
Juan Zhao
Cai Xu
Huansheng Song
Verlag: Springer Nature Singapore
Buch: Web and Big Data
Print ISBN: 978-981-9724-20-8

Electronic ISBN: 978-981-9724-21-5

Copyright-Jahr: 2024
DOI: https://doi.org/10.1007/978-981-97-2421-5_13

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner