Skip to main content
Erschienen in: Machine Vision and Applications 6/2020

01.09.2020 | Original Paper

WatchNet++: efficient and accurate depth-based network for detecting people attacks and intrusion

verfasst von: M. Villamizar, A. Martínez-González, O. Canévet, J.-M. Odobez

Erschienen in: Machine Vision and Applications | Ausgabe 6/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We present an efficient and accurate people detection approach based on deep learning to detect people attacks and intrusion in video surveillance scenarios Unlike other approaches using background segmentation and pre-processing techniques, which are not able to distinguish people from other elements in the scene, we propose WatchNet++ that is a depth-based and sequential network that localizes people in top-view depth images by predicting human body joints and pairwise connections (links) such as head and shoulders. WatchNet++ comprises a set of prediction stages and up-sampling operations that progressively refine the predictions of joints and links, leading to more accurate localization results. In order to train the network with varied and abundant data, we also present a large synthetic dataset of depth images with human models that is used to pre-train the network model. Subsequently, domain adaptation to real data is done via fine-tuning using a real dataset of depth images with people performing attacks and intrusion. An extensive evaluation of the proposed approach is conducted for the detection of attacks in airlocks and the counting of people in indoors and outdoors, showing high detection scores and efficiency. The network runs at 10 and 28 FPS using CPU and GPU, respectively.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ahmad, M., Ahmed, I., Ullah, K., Khan, I., Khattak, A., Adnan, A.: Person detection from overhead view: a survey. Int. J. Adv. Comput. Sci. Appl. 10(4), 567–577 (2019) Ahmad, M., Ahmed, I., Ullah, K., Khan, I., Khattak, A., Adnan, A.: Person detection from overhead view: a survey. Int. J. Adv. Comput. Sci. Appl. 10(4), 567–577 (2019)
2.
Zurück zum Zitat Ahmed, I., Adnan, A.: A robust algorithm for detecting people in overhead views. Clust. Comput. 21(1), 633–654 (2018)CrossRef Ahmed, I., Adnan, A.: A robust algorithm for detecting people in overhead views. Clust. Comput. 21(1), 633–654 (2018)CrossRef
3.
Zurück zum Zitat Bondi, E., Seidenari, L., Bagdanov, A.D., Del Bimbo, A.: Real-time people counting from depth imagery of crowded environments. In: 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 337–342. IEEE (2014) Bondi, E., Seidenari, L., Bagdanov, A.D., Del Bimbo, A.: Real-time people counting from depth imagery of crowded environments. In: 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 337–342. IEEE (2014)
4.
Zurück zum Zitat Boominathan, L., Kruthiventi, S.S., Babu, R.V.: Crowdnet: A deep convolutional network for dense crowd counting. In: Proceedings of the 2016 ACM on Multimedia Conference (2016) Boominathan, L., Kruthiventi, S.S., Babu, R.V.: Crowdnet: A deep convolutional network for dense crowd counting. In: Proceedings of the 2016 ACM on Multimedia Conference (2016)
5.
Zurück zum Zitat Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291–7299 (2017) Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291–7299 (2017)
6.
Zurück zum Zitat Carincotte, C., Naturel, X., Hick, M., Odobez, J.M., Yao, J., Bastide, A., Corbucci, B.: Understanding metro station usage using closed circuit television cameras analysis. In: ITSC (2008) Carincotte, C., Naturel, X., Hick, M., Odobez, J.M., Yao, J., Bastide, A., Corbucci, B.: Understanding metro station usage using closed circuit television cameras analysis. In: ITSC (2008)
7.
Zurück zum Zitat Carletti, V., Del Pizzo, L., Percannella, G., Vento, M.: An efficient and effective method for people detection from top-view depth cameras. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6. IEEE (2017) Carletti, V., Del Pizzo, L., Percannella, G., Vento, M.: An efficient and effective method for people detection from top-view depth cameras. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6. IEEE (2017)
8.
Zurück zum Zitat Chen, S., Bremond, F., Nguyen, H., Thomas, H.: Exploring depth information for head detection with depth images. In: 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 228–234. IEEE (2016) Chen, S., Bremond, F., Nguyen, H., Thomas, H.: Exploring depth information for head detection with depth images. In: 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 228–234. IEEE (2016)
9.
Zurück zum Zitat Del Pizzo, L., Foggia, P., Greco, A., Percannella, G., Vento, M.: A versatile and effective method for counting people on either rgb or depth overhead cameras. In: 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 1–6. IEEE (2015) Del Pizzo, L., Foggia, P., Greco, A., Percannella, G., Vento, M.: A versatile and effective method for counting people on either rgb or depth overhead cameras. In: 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 1–6. IEEE (2015)
10.
Zurück zum Zitat Dumoulin, J., Canévet, O., Villamizar, M., Nunes, H., Khaled, O.A., Mugellini, E., Moscheni, F., Odobez, J.M.: Unicity: A depth maps database for people detection in security airlocks. In: 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6. IEEE (2018) Dumoulin, J., Canévet, O., Villamizar, M., Nunes, H., Khaled, O.A., Mugellini, E., Moscheni, F., Odobez, J.M.: Unicity: A depth maps database for people detection in security airlocks. In: 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6. IEEE (2018)
11.
Zurück zum Zitat Galčík, F., Gargalík, R.: Real-time depth map based people counting. In: International Conference on Advanced Concepts for Intelligent Vision Systems, pp. 330–341. Springer (2013) Galčík, F., Gargalík, R.: Real-time depth map based people counting. In: International Conference on Advanced Concepts for Intelligent Vision Systems, pp. 330–341. Springer (2013)
12.
Zurück zum Zitat Garrell, A., Villamizar, M., Moreno-Noguer, F., Sanfeliu, A.: Teaching robot’s proactive behavior using human assistance. Int. J. Soc. Robot. 9(2), 231–249 (2017)CrossRef Garrell, A., Villamizar, M., Moreno-Noguer, F., Sanfeliu, A.: Teaching robot’s proactive behavior using human assistance. Int. J. Soc. Robot. 9(2), 231–249 (2017)CrossRef
13.
Zurück zum Zitat Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010) Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)
14.
Zurück zum Zitat Hu, R., Wang, R., Shan, S., Chen, X.: Robust head-shoulder detection using a two-stage cascade framework. In: 2014 22nd International Conference on Pattern Recognition, pp. 2796–2801. IEEE (2014) Hu, R., Wang, R., Shan, S., Chen, X.: Robust head-shoulder detection using a two-stage cascade framework. In: 2014 22nd International Conference on Pattern Recognition, pp. 2796–2801. IEEE (2014)
15.
Zurück zum Zitat Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. CoRR abs/1502.03167 (2015) Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. CoRR abs/1502.03167 (2015)
16.
Zurück zum Zitat Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014) Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014)
17.
Zurück zum Zitat Kreiss, S., Bertoni, L., Alahi, A.: Pifpaf: Composite fields for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11977–11986 (2019) Kreiss, S., Bertoni, L., Alahi, A.: Pifpaf: Composite fields for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11977–11986 (2019)
18.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
19.
Zurück zum Zitat Lejbolle, A.R., Krogh, B., Nasrollahi, K., Moeslund, T.B.: Attention in multimodal neural networks for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 179–187 (2018) Lejbolle, A.R., Krogh, B., Nasrollahi, K., Moeslund, T.B.: Attention in multimodal neural networks for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 179–187 (2018)
20.
Zurück zum Zitat Lempitsky, V., Zisserman, A.: Learning to count objects in images. In: Advances in Neural Information Processing Systems, pp. 1324–1332 (2010) Lempitsky, V., Zisserman, A.: Learning to count objects in images. In: Advances in Neural Information Processing Systems, pp. 1324–1332 (2010)
21.
Zurück zum Zitat Li, X., Chen, H., Qi, X., Dou, Q., Fu, C.W., Heng, P.A.: H-denseunet: hybrid densely connected unet for liver and tumor segmentation from ct volumes. IEEE Trans. Med. Imag. 37(12), 2663–2674 (2018)CrossRef Li, X., Chen, H., Qi, X., Dou, Q., Fu, C.W., Heng, P.A.: H-denseunet: hybrid densely connected unet for liver and tumor segmentation from ct volumes. IEEE Trans. Med. Imag. 37(12), 2663–2674 (2018)CrossRef
22.
Zurück zum Zitat Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
23.
Zurück zum Zitat Ma, Z., Chan, A.B.: Crossing the line: Crowd counting by integer programming with local features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2539–2546 (2013) Ma, Z., Chan, A.B.: Crossing the line: Crowd counting by integer programming with local features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2539–2546 (2013)
24.
Zurück zum Zitat Nalepa, J., Szymanek, J., Kawulok, M.: Real-time people counting from depth images. In: International Conference: Beyond Databases, Architectures and Structures (2015) Nalepa, J., Szymanek, J., Kawulok, M.: Real-time people counting from depth images. In: International Conference: Beyond Databases, Architectures and Structures (2015)
25.
Zurück zum Zitat Rauter, M.: Reliable human detection and tracking in top-view depth images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 529–534 (2013) Rauter, M.: Reliable human detection and tracking in top-view depth images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 529–534 (2013)
26.
Zurück zum Zitat Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017) Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
27.
Zurück zum Zitat Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Springer (2015) Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Springer (2015)
28.
29.
Zurück zum Zitat Song, H., Sun, S., Akhtar, N., Zhang, C., Li, J., Mian, A.: Benchmark data and method for real-time people counting in cluttered scenes using depth sensors. arXiv:1804.04339 (2018) Song, H., Sun, S., Akhtar, N., Zhang, C., Li, J., Mian, A.: Benchmark data and method for real-time people counting in cluttered scenes using depth sensors. arXiv:​1804.​04339 (2018)
30.
Zurück zum Zitat Tremblay, J., Prakash, A., Acuna, D., Brophy, M., Jampani, V., Anil, C., To, T., Cameracci, E., Boochoon, S., Birchfield, S.: Training deep networks with synthetic data: Bridging the reality gap by domain randomization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 969–977 (2018) Tremblay, J., Prakash, A., Acuna, D., Brophy, M., Jampani, V., Anil, C., To, T., Cameracci, E., Boochoon, S., Birchfield, S.: Training deep networks with synthetic data: Bridging the reality gap by domain randomization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 969–977 (2018)
31.
Zurück zum Zitat Tu, J., Zhang, C., Hao, P.: Robust real-time attention-based head-shoulder detection for video surveillance. In: 2013 IEEE International Conference on Image Processing, pp. 3340–3344. IEEE (2013) Tu, J., Zhang, C., Hao, P.: Robust real-time attention-based head-shoulder detection for video surveillance. In: 2013 IEEE International Conference on Image Processing, pp. 3340–3344. IEEE (2013)
32.
Zurück zum Zitat Varol, G., Romero, J., Martin, X., Mahmood, N., Black, M.J., Laptev, I., Schmid, C.: Learning from synthetic humans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 109–117 (2017) Varol, G., Romero, J., Martin, X., Mahmood, N., Black, M.J., Laptev, I., Schmid, C.: Learning from synthetic humans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 109–117 (2017)
33.
Zurück zum Zitat Vera, P., Zenteno, D., Salas, J.: Counting pedestrians in bidirectional scenarios using zenithal depth images. In: Mexican Conference on Pattern Recognition (2013) Vera, P., Zenteno, D., Salas, J.: Counting pedestrians in bidirectional scenarios using zenithal depth images. In: Mexican Conference on Pattern Recognition (2013)
34.
Zurück zum Zitat Villamizar, M., Andrade-Cetto, J., Sanfeliu, A., Moreno-Noguer, F.: Boosted random ferns for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 40(2), 272–288 (2018)CrossRef Villamizar, M., Andrade-Cetto, J., Sanfeliu, A., Moreno-Noguer, F.: Boosted random ferns for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 40(2), 272–288 (2018)CrossRef
35.
Zurück zum Zitat Villamizar, M., Martínez-González, A., Canévet, O., Odobez, J.M.: Watchnet: efficient and depth-based network for people detection in video surveillance systems. In: IEEE International Conference on Advanced Video and Signal-based Surveillance (2018) Villamizar, M., Martínez-González, A., Canévet, O., Odobez, J.M.: Watchnet: efficient and depth-based network for people detection in video surveillance systems. In: IEEE International Conference on Advanced Video and Signal-based Surveillance (2018)
36.
Zurück zum Zitat Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2878–2890 (2012)CrossRef Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2878–2890 (2012)CrossRef
37.
Zurück zum Zitat Zhang, X., Yan, J., Feng, S., Lei, Z., Yi, D., Li, S.Z.: Water filling: Unsupervised people counting via vertical kinect sensor. In: 2012 IEEE Ninth International Conference on Advanced Video and Signal-based Surveillance, pp. 215–220. IEEE (2012) Zhang, X., Yan, J., Feng, S., Lei, Z., Yi, D., Li, S.Z.: Water filling: Unsupervised people counting via vertical kinect sensor. In: 2012 IEEE Ninth International Conference on Advanced Video and Signal-based Surveillance, pp. 215–220. IEEE (2012)
38.
Zurück zum Zitat Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 589–597 (2016) Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 589–597 (2016)
39.
Zurück zum Zitat Zhu, L., Wong, K.H.: Human tracking and counting using the kinect range sensor based on adaboost and kalman filter. In: International Symposium on Visual Computing (2013) Zhu, L., Wong, K.H.: Human tracking and counting using the kinect range sensor based on adaboost and kalman filter. In: International Symposium on Visual Computing (2013)
Metadaten
Titel
WatchNet++: efficient and accurate depth-based network for detecting people attacks and intrusion
verfasst von
M. Villamizar
A. Martínez-González
O. Canévet
J.-M. Odobez
Publikationsdatum
01.09.2020
Verlag
Springer Berlin Heidelberg
Erschienen in
Machine Vision and Applications / Ausgabe 6/2020
Print ISSN: 0932-8092
Elektronische ISSN: 1432-1769
DOI
https://doi.org/10.1007/s00138-020-01089-y

Weitere Artikel der Ausgabe 6/2020

Machine Vision and Applications 6/2020 Zur Ausgabe

Premium Partner