nach oben

International Journal of Multimedia Information Retrieval

Erschienen in:

11.05.2022 | Regular Paper

InceptionDepth-wiseYOLOv2: improved implementation of YOLO framework for pedestrian detection

verfasst von: Sweta Panigrahi, U. S. N. Raju

Erschienen in: International Journal of Multimedia Information Retrieval | Ausgabe 3/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Pedestrian detection is one of the most challenging research areas in computer vision, as it involves classifying the image and localizing the pedestrian. Its applications, especially in automated surveillance and robotics, are exceedingly sought-after. Compared to traditional hand-crafted methods, convolutional neural networks (CNNs) have superior detection results. The single-stage detection networks, particularly the You Only Look Once (YOLO) network, have attained a satisfactory performance in object detection without compromising the computation speed and are among the state-of-the-art CNN-based methods. The YOLO framework can be leveraged to use in pedestrian detection as well. In this work, we propose an improved YOLOv2, called InceptionDepth-wiseYOLOv2. The proposed model uses a modified DarkNet53 engineered for a robust feature formation. Three inception depth-wise convolution modules are integrated at varying levels in DarkNet53, leading to a comprehensive feature of an object in the image. The proposed method is compared with state-of-the-art detection methods, i.e., FasterRCNN, YOLOv2 with various base networks, YOLOv3, and Single Shot Multibox Detector. Detection Error Trade-off Curve, Precision–Recall Curve, Log Average Miss Rate, and Average Precision performance metrics are used to compare the methods. The analysis for the count of pedestrians detected concerning their height is also carried out. The experimental study used three benchmark pedestrian datasets: the INRIA Pedestrian, PASCAL VOC 2012, and Caltech Pedestrian.

Vorheriger Artikel RGBD deep multi-scale network for background subtraction

Nächster Artikel How can users’ comments posted on social media videos be a source of effective tags?

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Dollár P, Wojek C, Schiele B, Perona P (2009) Pedestrian detection: A benchmark. In Proceedings of the IEEE conference on computer vision and pattern recognition, 304–311. https://doi.org/10.1109/CVPR.2009.5206631

Cao J, Pang Y, Li X (2016) Pedestrian detection inspired by appearance constancy and shape symmetry. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1316–1324. https://doi.org/10.1109/TIP.2016.2609807

Zhu C, Peng Y (2015) A boosted multi-task model for pedestrian detection with occlusion handling. IEEE Trans Image Process 24(12):5619–5629. https://doi.org/10.1109/TIP.2015.2483376MathSciNetCrossRefMATH

Cao J, Pang Y, Li X (2017) Learning multilayer channel features for pedestrian detection. IEEE Trans Image Process 26(7):3210–3220. https://doi.org/10.1109/TIP.2017.2694224MathSciNetCrossRefMATH

Li J, Liang X, Shen S, Xu T, Feng J, Yan S (2017) Scale-aware fast R-CNN for pedestrian detection. IEEE Trans Multimedia 20(4):985–996. https://doi.org/10.1109/TMM.2017.2759508CrossRef

Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In IEEE computer society conference on computer vision and pattern recognition, 886–893. https://doi.org/10.1109/CVPR.2005.177

Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vision 57(2):137–154. https://doi.org/10.1023/B:VISI.0000013087.49260.fbCrossRef

Li H, Wu Z, Zhang J (2016) Pedestrian detection based on deep learning model. In 9th International congress on image and signal processing, Biomedical engineering and informatics, pp 796–800. https://doi.org/10.1109/CISP-BMEI.2016.7852818

Sermanet P, Kavukcuoglu K, Chintala S, LeCun Y (2013) Pedestrian detection with unsupervised multi-stage feature learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 3626–3633. https://doi.org/10.1109/CVPR.2013.465

10.

Sabokrou M, Fayyaz M, Fathy M, Klette R (2017) Deep-cascade: Cascading 3d deep neural networks for fast anomaly detection and localization in crowded scenes. IEEE Trans Image Process 26(4):1992–2004. https://doi.org/10.1109/TIP.2017.2670780MathSciNetCrossRefMATH

11.

Zhao Y, Yuan Z, Chen B (2019) Accurate pedestrian detection by human pose regression. IEEE Trans Image Process 29:1591–1605. https://doi.org/10.1109/TIP.2019.2942686MathSciNetCrossRef

12.

Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: IEEE conference on computer vision and pattern recognition, 3354–3361. https://doi.org/10.1109/CVPR.2012.6248074

13.

Zhang S, Benenson R, Schiele B (2017) Citypersons: A diverse dataset for pedestrian detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 3213–3221. https://doi.org/10.1109/CVPR.2017.474

14.

Wu S, Wang S, Laganiere R, Liu C, Wong HS, Xu Y (2017) Exploiting target data to learn deep convolutional networks for scene-adapted human detection. IEEE Trans Image Process 27(3):1418–1432. https://doi.org/10.1109/TIP.2017.2779271MathSciNetCrossRefMATH

15.

Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe B, Matas J, Sebe N, Welling M (eds.) Computer vision – ECCV 2016. Lecture notes in computer science. Springer, Cham, 354–370. https://doi.org/10.1007/978-3-319-46493-0_22

16.

Zhang L, Lin L, Liang X, He K (2016) Is faster R-CNN doing well for pedestrian detection?. In: Leibe B, Matas J, Sebe N, Welling M (eds.) Computer vision – ECCV 2016. Lecture notes in computer science. Springer, Cham, 443–457. https://doi.org/10.1007/978-3-319-46475-6_28

17.

Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 779–788. https://doi.org/10.1109/CVPR.2016.91

18.

Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 7263–7271. https://doi.org/10.1109/CVPR.2017.690

19.

Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767.

20.

Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L (2018) MobileNetV2: Inverted residuals and linear bottlenecks. In IEEE/CVF conference on computer vision and pattern recognition, 4510–4520. https://doi.org/10.1109/CVPR.2018.00474

21.

Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: IEEE conference on computer vision and pattern recognition (CVPR), 1800–1807. https://doi.org/10.1109/CVPR.2017.195

22.

Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88(2):303–338. https://doi.org/10.1007/s11263-009-0275-4CrossRef

23.

Dollár P, Tu Z, Perona P, Belongie S (2009) Integral channel features. In: Proceedings of the British machine vision conference, 91, 1-91. https://doi.org/10.5244/C.23.91

24.

Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987. https://doi.org/10.1109/TPAMI.2002.1017623CrossRefMATH

25.

Hurney P, Waldron P, Morgan F, Jones E, Glavin M (2015) Night-time pedestrian classification with histograms of oriented gradients-local binary patterns vectors. IET Intel Transport Syst 9(1):75–85. https://doi.org/10.1049/iet-its.2013.0163CrossRef

26.

Kumar K, Mishra RK (2020) A heuristic SVM based pedestrian detection approach employing shape and texture descriptors. Multimed Tools Appl 79:21389–21408. https://doi.org/10.1007/s11042-020-08864-zCrossRef

27.

Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105

28.

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778. https://doi.org/10.1109/CVPR.2016.90

29.

Cai Z, Saberian M, Vasconcelos N (2015) Learning complexity-aware cascades for deep pedestrian detection. In Proceedings of the IEEE international conference on computer vision, 3361–3369. https://doi.org/10.1109/TPAMI.2019.2910514

30.

Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1–9. https://doi.org/10.1109/CVPR.2015.7298594

31.

Hosang J, Omran M, Benenson R, Schiele B (2015) Taking a deeper look at pedestrians. In Proceedings of the IEEE conference on computer vision and pattern recognition, 4073–4082. https://doi.org/10.1109/CVPR.2015.7299034

32.

Tian Y, Luo P, Wang X, Tang X (2015) Deep learning strong parts for pedestrian detection. In: Proceedings of the IEEE international conference on computer vision, 1904–1912. https://doi.org/10.1109/ICCV.2015.221

33.

Tian Y, Luo P, Wang X, Tang X (2015) Pedestrian detection aided by deep learning semantic tasks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 5079–5087. https://doi.org/10.1109/CVPR.2015.7299143

34.

Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, 580–587. https://doi.org/10.1109/CVPR.2014.81

35.

Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:91–99

36.

Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: Leibe B, Matas J, Sebe N, Welling M (eds.) Computer vision – ECCV 2016, Lecture notes in computer science. Springer, Cham, 21–37. https://doi.org/10.1007/978-3-319-46448-0_2

37.

Yi Z, Yongliang S, Jun Z (2019) An improved tiny-yolov3 pedestrian detection algorithm. Optik 183:17–23. https://doi.org/10.1016/j.ijleo.2019.02.038CrossRef

38.

Lan W, Dang J, Wang Y, Wang S (2018) Pedestrian detection based on YOLO network model. In: IEEE international conference on mechatronics and automation, 1547–1551. https://doi.org/10.1109/ICMA.2018.8484698

39.

Liu Z, Chen Z, Li Z, Hu W (2018) An efficient pedestrian detection method based on YOLOv2. Math Probl Eng. https://doi.org/10.1155/2018/3518959CrossRef

40.

Hsu WY, Lin WY (2020) Ratio-and-scale-aware YOLO for pedestrian detection. IEEE Trans Image Process 30:934–947. https://doi.org/10.1109/TIP.2020.3039574CrossRef

41.

Yang X, Wang Y, Laganière R (2020) A scale-aware YOLO model for pedestrian detection. In: Bebis G. et al. (eds.) Advances in visual computing, ISVC 2020, Lecture notes in computer science, Springer, Cham, 12510, 15–26. https://doi.org/10.1007/978-3-030-64559-5_2

42.

Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In IEEE conference on computer vision and pattern recognition, pp 2818–2826. https://doi.org/10.1109/CVPR.2016.308

43.

Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv preprint arXiv:1602.07360.

44.

Martin A, Doddington G, Kamm T, Ordowski M, Przybocki M (1997) The DET curve in assessment of detection task performance. Defense Technical Information Center. Virginia, US.

45.

Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761. https://doi.org/10.1109/TPAMI.2011.155CrossRef

46.

Erhan D, Szegedy C, Toshev A, Anguelov D (2014) Scalable object detection using deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2147–2154. https://doi.org/10.1109/CVPR.2014.276

Titel: InceptionDepth-wiseYOLOv2: improved implementation of YOLO framework for pedestrian detection
verfasst von: Sweta Panigrahi
U. S. N. Raju
Publikationsdatum: 11.05.2022
Verlag: Springer London
Erschienen in: International Journal of Multimedia Information Retrieval / Ausgabe 3/2022
Print ISSN: 2192-6611
Elektronische ISSN: 2192-662X
DOI: https://doi.org/10.1007/s13735-022-00239-4

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2022

Organ segmentation from computed tomography images using the 3D convolutional neural network: a systematic review

How can users’ comments posted on social media videos be a source of effective tags?

Text detection, recognition, and script identification in natural scene images: a Review

Semantic-enhanced discriminative embedding learning for cross-modal retrieval

Music emotion recognition based on segment-level two-stage learning

A unified approach of detecting misleading images via tracing its instances on web and analyzing its past context for the verification of multimedia content