Skip to main content
Erschienen in: Neural Computing and Applications 7/2020

13.10.2018 | Original Article

Single-column CNN for crowd counting with pixel-wise attention mechanism

verfasst von: Bisheng Wang, Guo Cao, Yanfeng Shang, Licun Zhou, Youqiang Zhang, Xuesong Li

Erschienen in: Neural Computing and Applications | Ausgabe 7/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper presents a novel method for accurate people counting in highly dense crowd images. The proposed method consists of three modules: extracting foreground regions (EF), pixel-wise attention mechanism (PAM) and single-column density map estimator (S-DME). EF can suppress the disturbance of complex background efficiently with a fully convolutional network, PAM performs pixel-wise classification of crowd images to generate high-quality local crowd density maps, and S-DME is a carefully designed single-column network that can learn more representative features with much fewer parameters. In addition, two new evaluation metrics are introduced to get a comprehensive understanding of the performance of different modules in our algorithm. Experiments demonstrate that our approach can get the state-of-the-art results on several challenging datasets including our dataset with highly cluttered environments and various camera perspectives.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Zhou B, Tang X, Wang X (2015) Learning collective crowd behaviors with dynamic pedestrian-agents. Int J Comput Vis 111(1):50–68CrossRef Zhou B, Tang X, Wang X (2015) Learning collective crowd behaviors with dynamic pedestrian-agents. Int J Comput Vis 111(1):50–68CrossRef
2.
Zurück zum Zitat Huang L, Chen T, Wang Y, Yuan H (2015) Congestion detection of pedestrians using the velocity entropy: a case study of love parade 2010 disaster. Phys A Stat Mech Appl 440:200–209CrossRef Huang L, Chen T, Wang Y, Yuan H (2015) Congestion detection of pedestrians using the velocity entropy: a case study of love parade 2010 disaster. Phys A Stat Mech Appl 440:200–209CrossRef
3.
Zurück zum Zitat Li W, Mahadevan V, Vasconcelos N (2014) Anomaly detection and localization in crowded scenes. IEEE Trans Pattern Anal Mach Intell 36(1):18–32CrossRef Li W, Mahadevan V, Vasconcelos N (2014) Anomaly detection and localization in crowded scenes. IEEE Trans Pattern Anal Mach Intell 36(1):18–32CrossRef
4.
Zurück zum Zitat Chaker R, Al Aghbari Z, Junejo IN (2017) Social network model for crowd anomaly detection and localization. Pattern Recognit 61:266–281CrossRef Chaker R, Al Aghbari Z, Junejo IN (2017) Social network model for crowd anomaly detection and localization. Pattern Recognit 61:266–281CrossRef
5.
Zurück zum Zitat Benabbas Y, Ihaddadene N, Djeraba C (2011) Motion pattern extraction and event detection for automatic visual surveillance. EURASIP J Image Video Process 2011(1):1–15CrossRef Benabbas Y, Ihaddadene N, Djeraba C (2011) Motion pattern extraction and event detection for automatic visual surveillance. EURASIP J Image Video Process 2011(1):1–15CrossRef
6.
Zurück zum Zitat Onoro-Rubio D, L’opez-Sastre RJ (2016) Towards perspective-free object counting with deep learning. In: ECCV Onoro-Rubio D, L’opez-Sastre RJ (2016) Towards perspective-free object counting with deep learning. In: ECCV
7.
Zurück zum Zitat French G, Fisher M, Mackiewicz M, Needle C (2015) Convolutional neural networks for counting fish in fisheries surveillance video. In: British machine vision conference workshop, BMVA Press French G, Fisher M, Mackiewicz M, Needle C (2015) Convolutional neural networks for counting fish in fisheries surveillance video. In: British machine vision conference workshop, BMVA Press
8.
Zurück zum Zitat Chen K, Loy CC, Gong S, Xiang T (2012) Feature mining for localized crowd counting. In: European conference on computer vision Chen K, Loy CC, Gong S, Xiang T (2012) Feature mining for localized crowd counting. In: European conference on computer vision
9.
Zurück zum Zitat Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. In: Computer vision and pattern recognition (CVPR) Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. In: Computer vision and pattern recognition (CVPR)
10.
Zurück zum Zitat Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In: Computer vision and pattern recognition (CVPR) Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In: Computer vision and pattern recognition (CVPR)
11.
Zurück zum Zitat Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained partbased models. In: PAMI Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained partbased models. In: PAMI
12.
Zurück zum Zitat Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: NIPS Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: NIPS
13.
Zurück zum Zitat Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:​1409.​1556
14.
Zurück zum Zitat He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Computer vision and pattern recognition (CVPR) He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Computer vision and pattern recognition (CVPR)
15.
Zurück zum Zitat Sindagi VA, Patel VM (2017) Generating high-quality crowd density maps using contextual pyramid CNNs. In: ICCV Sindagi VA, Patel VM (2017) Generating high-quality crowd density maps using contextual pyramid CNNs. In: ICCV
16.
Zurück zum Zitat Onoro-Rubio D, Lopez-Sastre RJ (2016) Towards perspective-free object counting with deep learning. In: ECCV Onoro-Rubio D, Lopez-Sastre RJ (2016) Towards perspective-free object counting with deep learning. In: ECCV
17.
Zurück zum Zitat Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. In: CVPR Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. In: CVPR
18.
Zurück zum Zitat Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. arXiv:1409.4842v1 Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. arXiv:​1409.​4842v1
19.
Zurück zum Zitat Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR
20.
Zurück zum Zitat Girshick R (2015) Fast R-CNN. In: ICCV Girshick R (2015) Fast R-CNN. In: ICCV
21.
Zurück zum Zitat Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. In: PAMI Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. In: PAMI
22.
Zurück zum Zitat Zhang H, Ji Y, Huang W, Liu L (2018) Sitcom-star-based clothing retrieval for video advertising: a deep learning framework. Neural Comput Appl 2018:1–20 Zhang H, Ji Y, Huang W, Liu L (2018) Sitcom-star-based clothing retrieval for video advertising: a deep learning framework. Neural Comput Appl 2018:1–20
23.
Zurück zum Zitat Zhang H, Cao X, Ho JKL, Chow TWS (2018) Object-level video advertising: an optimization framework. IEEE Trans Ind Inform 13(2):520–531CrossRef Zhang H, Cao X, Ho JKL, Chow TWS (2018) Object-level video advertising: an optimization framework. IEEE Trans Ind Inform 13(2):520–531CrossRef
24.
Zurück zum Zitat Mostajabi M, Yadollahpour P, Shakhnarovich G (2014) Feedforward semantic segmentation with zoom-out features. Arxiv preprint arxiv:1412.0774 Mostajabi M, Yadollahpour P, Shakhnarovich G (2014) Feedforward semantic segmentation with zoom-out features. Arxiv preprint arxiv:​1412.​0774
25.
Zurück zum Zitat Dai J, He K, Sun J (2015) Convolutional feature masking for joint object and stuff segmentation. In: CVPR Dai J, He K, Sun J (2015) Convolutional feature masking for joint object and stuff segmentation. In: CVPR
26.
Zurück zum Zitat Hariharan B, Arbelaez P, Girshick R, Malik J (2014) Simultaneous detection and segmentation. In: ECCV Hariharan B, Arbelaez P, Girshick R, Malik J (2014) Simultaneous detection and segmentation. In: ECCV
27.
Zurück zum Zitat Hariharan B, Arbelaez P, Girshick R, Malik J (2015) Hyper-columns for object segmentation and fine-grained localization. In: CVPR Hariharan B, Arbelaez P, Girshick R, Malik J (2015) Hyper-columns for object segmentation and fine-grained localization. In: CVPR
28.
Zurück zum Zitat Jiang F, Grigorev A, Rho S, Tian Z, Fu Y, Jifara W, Adil K, Liu S (2017) Medical image semantic segmentation based on deep learning. Neural Comput Appl 2017(8):1–7 Jiang F, Grigorev A, Rho S, Tian Z, Fu Y, Jifara W, Adil K, Liu S (2017) Medical image semantic segmentation based on deep learning. Neural Comput Appl 2017(8):1–7
29.
Zurück zum Zitat Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Computer vision and pattern recognition (CVPR) Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Computer vision and pattern recognition (CVPR)
30.
Zurück zum Zitat Chen L-C, Papandreou G, Murphy K, Yuille AL (2015) Semantic image segmentation with deep convolutional nets and fully connected CRFs. In: ICLR Chen L-C, Papandreou G, Murphy K, Yuille AL (2015) Semantic image segmentation with deep convolutional nets and fully connected CRFs. In: ICLR
31.
Zurück zum Zitat Dalal N, Triggs B (2015) Histograms of oriented gradients for human detection. In: CVPR Dalal N, Triggs B (2015) Histograms of oriented gradients for human detection. In: CVPR
32.
Zurück zum Zitat Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154CrossRef Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154CrossRef
33.
Zurück zum Zitat Wu B, Nevatia R (2005) Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors. In: ICCV Wu B, Nevatia R (2005) Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors. In: ICCV
34.
Zurück zum Zitat Li M, Zhang Z, Huang K, Tan T (2008) Estimating the number of people in crowd scenes by mid based foreground segmentation and head-shoulder detection. In: Pattern recognition Li M, Zhang Z, Huang K, Tan T (2008) Estimating the number of people in crowd scenes by mid based foreground segmentation and head-shoulder detection. In: Pattern recognition
35.
Zurück zum Zitat Huang S, Xi Li, Zhang Z, Wu F, Gao S, Ji R, Han J (2017) Body structure aware deep crowd counting. IEEE Trans Image Process 27(3):1049–1059MathSciNetCrossRef Huang S, Xi Li, Zhang Z, Wu F, Gao S, Ji R, Han J (2017) Body structure aware deep crowd counting. IEEE Trans Image Process 27(3):1049–1059MathSciNetCrossRef
36.
Zurück zum Zitat Ryan D, Denman S, Fookes C, Sridharan S (2009) Crowd counting using multiple local features. Digit Image Comput Tech Appl 63(6):81–88 Ryan D, Denman S, Fookes C, Sridharan S (2009) Crowd counting using multiple local features. Digit Image Comput Tech Appl 63(6):81–88
37.
Zurück zum Zitat Wang C, Zhang H, Yang L, Liu S, Cao (2015) Deep people counting in extremely dense crowds. In: Proceedings of the 23rd ACM international conference on Multimedia. ACM New York, pp 1299–1302 Wang C, Zhang H, Yang L, Liu S, Cao (2015) Deep people counting in extremely dense crowds. In: Proceedings of the 23rd ACM international conference on Multimedia. ACM New York, pp 1299–1302
38.
Zurück zum Zitat Li Y, Zhang X, Chen D (2018) CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: CVPR Li Y, Zhang X, Chen D (2018) CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: CVPR
39.
Zurück zum Zitat Kong T, Yao A, Chen Y, Sun F (2016) HyperNet: towards accurate region proposal generation and joint object detection. In: CVPR Kong T, Yao A, Chen Y, Sun F (2016) HyperNet: towards accurate region proposal generation and joint object detection. In: CVPR
40.
Zurück zum Zitat Leng J, Liu Y (2018) An enhanced SSD with feature fusion and visual reasoning for object detection. Neural Comput Appl 2018(2):1–10 Leng J, Liu Y (2018) An enhanced SSD with feature fusion and visual reasoning for object detection. Neural Comput Appl 2018(2):1–10
41.
Zurück zum Zitat Wang C, Zhang H, Yang L, Liu S, Cao X (2015) Deep people counting in extremely dense crowds. In: ACM International Conference on Multimedia Wang C, Zhang H, Yang L, Liu S, Cao X (2015) Deep people counting in extremely dense crowds. In: ACM International Conference on Multimedia
42.
Zurück zum Zitat Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:​1408.​5093
43.
Zurück zum Zitat Sindagi VA, Patel VM (2017) CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: AVSS Sindagi VA, Patel VM (2017) CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: AVSS
44.
Zurück zum Zitat Liu J, Gao C, Meng D, Hauptmann AG (2018) DecideNet: counting varying density crowds through attention guided detection and density estimation. In: CVPR Liu J, Gao C, Meng D, Hauptmann AG (2018) DecideNet: counting varying density crowds through attention guided detection and density estimation. In: CVPR
Metadaten
Titel
Single-column CNN for crowd counting with pixel-wise attention mechanism
verfasst von
Bisheng Wang
Guo Cao
Yanfeng Shang
Licun Zhou
Youqiang Zhang
Xuesong Li
Publikationsdatum
13.10.2018
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 7/2020
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-018-3810-9

Weitere Artikel der Ausgabe 7/2020

Neural Computing and Applications 7/2020 Zur Ausgabe

Premium Partner