Skip to main content
Top
Published in: International Journal of Multimedia Information Retrieval 4/2021

31-10-2021 | Regular Paper

AMS-CNN: Attentive multi-stream CNN for video-based crowd counting

Authors: Santosh Kumar Tripathy, Rajeev Srivastava

Published in: International Journal of Multimedia Information Retrieval | Issue 4/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In recent years video-based crowd counting and density estimation (CCDE) have become essential for crowd analysis. Current approaches rarely exploit spatial–temporal features for CCDE, and they also usually do not consider measures to minimize the frame's background influence for obtaining crowd density maps, which has resulted in lower performance in terms of Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). Again, attention to individual feature set's response toward crowd counting is also neglected. To this end, we are motivated to design an end-to-end trainable attentive multi-stream convolutional neural network (AMS-CNN) for crowd counting. At first, a multi-stream CNN (MS-CNN) is designed to obtain crowd density maps. The MS-CNN comprises three streams to fuse deep spatial, temporal, and spatial foreground features from different cues of the crowd video dataset, like frames, the volume of frames, and foregrounds of frames. To improve the accuracy, we designed three stream-wise attention modules to generate attentive crowd density maps, and their relative average is obtained using a relative averaged attentive density-map (RAAD) layer. The relative averaged density map is concatenated with the MS-CNN output, followed by two-stage CNN blocks to get the final density map. The experiments are demonstrated on three publicly available crowd density video datasets: Mall, UCSD, and Venice. We obtained promising and better results in terms of MAE and RMSE as compared with state-of-the-art approaches.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Chen K, Loy CC, Gong S, Xiang T (2012) Feature mining for localised crowd counting. BMVC 1:1–11 Chen K, Loy CC, Gong S, Xiang T (2012) Feature mining for localised crowd counting. BMVC 1:1–11
3.
go back to reference An S, Liu A, Venkatesh S (2007) Face recognition using kernel ridge regression. In: CVPR’07 IEEE Conference on, IEEE, pp. 1–7 An S, Liu A, Venkatesh S (2007) Face recognition using kernel ridge regression. In: CVPR’07 IEEE Conference on, IEEE, pp. 1–7
6.
go back to reference Wang C, Zhang H, Yang L, et al (2015) Deep people counting in extremely dense crowds. In: MM 2015 - Proceedings of the 2015 ACM Multimedia Conference. pp 1299–1302 Wang C, Zhang H, Yang L, et al (2015) Deep people counting in extremely dense crowds. In: MM 2015 - Proceedings of the 2015 ACM Multimedia Conference. pp 1299–1302
14.
go back to reference Shi X, Li X, Wu C, et al (2020) A real-time deep network for crowd counting Shi X, Li X, Wu C, et al (2020) A real-time deep network for crowd counting
17.
go back to reference Zhang S, Wu G (2017) FCN-rLSTM : Deep Spatio-Temporal Neural Networks for. Iccv 3687–3696 Zhang S, Wu G (2017) FCN-rLSTM : Deep Spatio-Temporal Neural Networks for. Iccv 3687–3696
19.
go back to reference Boominathan L (2016) CrowdNet : A deep convolutional network for dense crowd counting. In: Proceedings of 24th ACM International Conference on Multimedia pp. 640–644 Boominathan L (2016) CrowdNet : A deep convolutional network for dense crowd counting. In: Proceedings of 24th ACM International Conference on Multimedia pp. 640–644
20.
go back to reference Zeng L, Xu X, Cai B et al (2017) Multi-scale convolutional neural networks for crowd counting. IEEE Int Conf Image Process 2017:465–469 Zeng L, Xu X, Cai B et al (2017) Multi-scale convolutional neural networks for crowd counting. IEEE Int Conf Image Process 2017:465–469
27.
go back to reference Kang D, Chan A (2019) Crowd counting by adaptively fusing predictions from an image pyramid. In: Br Mach Vis Conf 2018, BMVC 2018 pp. 1–12 Kang D, Chan A (2019) Crowd counting by adaptively fusing predictions from an image pyramid. In: Br Mach Vis Conf 2018, BMVC 2018 pp. 1–12
31.
go back to reference Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536CrossRef Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536CrossRef
32.
go back to reference Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. pp. 1–15 Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. pp. 1–15
35.
go back to reference Pham VQ, Kozakaya T, Yamaguchi O, Okada R (2015) COUNT forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In: Proceedings of IEEE International Conference on Computer Vision 2015 Inter, pp. 3253–3261. https://doi.org/10.1109/ICCV.2015.372 Pham VQ, Kozakaya T, Yamaguchi O, Okada R (2015) COUNT forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In: Proceedings of IEEE International Conference on Computer Vision 2015 Inter, pp. 3253–3261. https://​doi.​org/​10.​1109/​ICCV.​2015.​372
36.
go back to reference Han K, Wan W, Yao H, Hou L (2017) Image crowd counting using convolutional neural network and markov random field. J Adv Comput Intell Intell Inform 2:1–6 Han K, Wan W, Yao H, Hou L (2017) Image crowd counting using convolutional neural network and markov random field. J Adv Comput Intell Intell Inform 2:1–6
40.
go back to reference Lempitsky V, Zisserman A (2010) Learning to count objects in images victor. Adv Neural Inf Process Syst 3:1–5 Lempitsky V, Zisserman A (2010) Learning to count objects in images victor. Adv Neural Inf Process Syst 3:1–5
Metadata
Title
AMS-CNN: Attentive multi-stream CNN for video-based crowd counting
Authors
Santosh Kumar Tripathy
Rajeev Srivastava
Publication date
31-10-2021
Publisher
Springer London
Published in
International Journal of Multimedia Information Retrieval / Issue 4/2021
Print ISSN: 2192-6611
Electronic ISSN: 2192-662X
DOI
https://doi.org/10.1007/s13735-021-00220-7

Other articles of this Issue 4/2021

International Journal of Multimedia Information Retrieval 4/2021 Go to the issue

Premium Partner