nach oben

Machine Vision and Applications

Erschienen in:

01.09.2020 | Original Paper

WatchNet++: efficient and accurate depth-based network for detecting people attacks and intrusion

verfasst von: M. Villamizar, A. Martínez-González, O. Canévet, J.-M. Odobez

Erschienen in: Machine Vision and Applications | Ausgabe 6/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We present an efficient and accurate people detection approach based on deep learning to detect people attacks and intrusion in video surveillance scenarios Unlike other approaches using background segmentation and pre-processing techniques, which are not able to distinguish people from other elements in the scene, we propose WatchNet++ that is a depth-based and sequential network that localizes people in top-view depth images by predicting human body joints and pairwise connections (links) such as head and shoulders. WatchNet++ comprises a set of prediction stages and up-sampling operations that progressively refine the predictions of joints and links, leading to more accurate localization results. In order to train the network with varied and abundant data, we also present a large synthetic dataset of depth images with human models that is used to pre-train the network model. Subsequently, domain adaptation to real data is done via fine-tuning using a real dataset of depth images with people performing attacks and intrusion. An extensive evaluation of the proposed approach is conducted for the detection of attacks in airlocks and the counting of people in indoors and outdoors, showing high detection scores and efficiency. The network runs at 10 and 28 FPS using CPU and GPU, respectively.

Nächster Artikel Boosting binary masks for multi-domain learning through affine transformations

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

http://www.blender.org.

http://www.makehuman.org/.

http://mocap.cs.cmu.edu/.

https://www.idiap.ch/dataset/unicity.

Ahmad, M., Ahmed, I., Ullah, K., Khan, I., Khattak, A., Adnan, A.: Person detection from overhead view: a survey. Int. J. Adv. Comput. Sci. Appl. 10(4), 567–577 (2019)

Ahmed, I., Adnan, A.: A robust algorithm for detecting people in overhead views. Clust. Comput. 21(1), 633–654 (2018)CrossRef

Bondi, E., Seidenari, L., Bagdanov, A.D., Del Bimbo, A.: Real-time people counting from depth imagery of crowded environments. In: 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 337–342. IEEE (2014)

Boominathan, L., Kruthiventi, S.S., Babu, R.V.: Crowdnet: A deep convolutional network for dense crowd counting. In: Proceedings of the 2016 ACM on Multimedia Conference (2016)

Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291–7299 (2017)

Carincotte, C., Naturel, X., Hick, M., Odobez, J.M., Yao, J., Bastide, A., Corbucci, B.: Understanding metro station usage using closed circuit television cameras analysis. In: ITSC (2008)

Carletti, V., Del Pizzo, L., Percannella, G., Vento, M.: An efficient and effective method for people detection from top-view depth cameras. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6. IEEE (2017)

Chen, S., Bremond, F., Nguyen, H., Thomas, H.: Exploring depth information for head detection with depth images. In: 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 228–234. IEEE (2016)

Del Pizzo, L., Foggia, P., Greco, A., Percannella, G., Vento, M.: A versatile and effective method for counting people on either rgb or depth overhead cameras. In: 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 1–6. IEEE (2015)

10.

Dumoulin, J., Canévet, O., Villamizar, M., Nunes, H., Khaled, O.A., Mugellini, E., Moscheni, F., Odobez, J.M.: Unicity: A depth maps database for people detection in security airlocks. In: 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6. IEEE (2018)

11.

Galčík, F., Gargalík, R.: Real-time depth map based people counting. In: International Conference on Advanced Concepts for Intelligent Vision Systems, pp. 330–341. Springer (2013)

12.

Garrell, A., Villamizar, M., Moreno-Noguer, F., Sanfeliu, A.: Teaching robot’s proactive behavior using human assistance. Int. J. Soc. Robot. 9(2), 231–249 (2017)CrossRef

13.

Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)

14.

Hu, R., Wang, R., Shan, S., Chen, X.: Robust head-shoulder detection using a two-stage cascade framework. In: 2014 22nd International Conference on Pattern Recognition, pp. 2796–2801. IEEE (2014)

15.

Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. CoRR abs/1502.03167 (2015)

16.

Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014)

17.

Kreiss, S., Bertoni, L., Alahi, A.: Pifpaf: Composite fields for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11977–11986 (2019)

18.

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

19.

Lejbolle, A.R., Krogh, B., Nasrollahi, K., Moeslund, T.B.: Attention in multimodal neural networks for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 179–187 (2018)

20.

Lempitsky, V., Zisserman, A.: Learning to count objects in images. In: Advances in Neural Information Processing Systems, pp. 1324–1332 (2010)

21.

Li, X., Chen, H., Qi, X., Dou, Q., Fu, C.W., Heng, P.A.: H-denseunet: hybrid densely connected unet for liver and tumor segmentation from ct volumes. IEEE Trans. Med. Imag. 37(12), 2663–2674 (2018)CrossRef

22.

Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)

23.

Ma, Z., Chan, A.B.: Crossing the line: Crowd counting by integer programming with local features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2539–2546 (2013)

24.

Nalepa, J., Szymanek, J., Kawulok, M.: Real-time people counting from depth images. In: International Conference: Beyond Databases, Architectures and Structures (2015)

25.

Rauter, M.: Reliable human detection and tracking in top-view depth images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 529–534 (2013)

26.

Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)

27.

Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Springer (2015)

28.

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)

29.

Song, H., Sun, S., Akhtar, N., Zhang, C., Li, J., Mian, A.: Benchmark data and method for real-time people counting in cluttered scenes using depth sensors. arXiv:1804.04339 (2018)

30.

Tremblay, J., Prakash, A., Acuna, D., Brophy, M., Jampani, V., Anil, C., To, T., Cameracci, E., Boochoon, S., Birchfield, S.: Training deep networks with synthetic data: Bridging the reality gap by domain randomization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 969–977 (2018)

31.

Tu, J., Zhang, C., Hao, P.: Robust real-time attention-based head-shoulder detection for video surveillance. In: 2013 IEEE International Conference on Image Processing, pp. 3340–3344. IEEE (2013)

32.

Varol, G., Romero, J., Martin, X., Mahmood, N., Black, M.J., Laptev, I., Schmid, C.: Learning from synthetic humans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 109–117 (2017)

33.

Vera, P., Zenteno, D., Salas, J.: Counting pedestrians in bidirectional scenarios using zenithal depth images. In: Mexican Conference on Pattern Recognition (2013)

34.

Villamizar, M., Andrade-Cetto, J., Sanfeliu, A., Moreno-Noguer, F.: Boosted random ferns for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 40(2), 272–288 (2018)CrossRef

35.

Villamizar, M., Martínez-González, A., Canévet, O., Odobez, J.M.: Watchnet: efficient and depth-based network for people detection in video surveillance systems. In: IEEE International Conference on Advanced Video and Signal-based Surveillance (2018)

36.

Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2878–2890 (2012)CrossRef

37.

Zhang, X., Yan, J., Feng, S., Lei, Z., Yi, D., Li, S.Z.: Water filling: Unsupervised people counting via vertical kinect sensor. In: 2012 IEEE Ninth International Conference on Advanced Video and Signal-based Surveillance, pp. 215–220. IEEE (2012)

38.

Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 589–597 (2016)

39.

Zhu, L., Wong, K.H.: Human tracking and counting using the kinect range sensor based on adaboost and kalman filter. In: International Symposium on Visual Computing (2013)

Titel: WatchNet++: efficient and accurate depth-based network for detecting people attacks and intrusion
verfasst von: M. Villamizar
A. Martínez-González
O. Canévet
J.-M. Odobez
Publikationsdatum: 01.09.2020
Verlag: Springer Berlin Heidelberg
Erschienen in: Machine Vision and Applications / Ausgabe 6/2020
Print ISSN: 0932-8092
Elektronische ISSN: 1432-1769
DOI: https://doi.org/10.1007/s00138-020-01089-y

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 6/2020

ARF-Crack: rotation invariant deep fully convolutional network for pixel-level crack detection

An efficient and globally optimal method for camera pose estimation using line features

OCLU-NET for occlusal classification of 3D dental models

Root identification in minirhizotron imagery with multiple instance learning

Multi-source domain adaptation for image classification

3D video semantic segmentation for wildfire smoke

Premium Partner