Skip to main content
Erschienen in: International Journal of Computer Vision 1/2021

30.08.2020

Pixel-Wise Crowd Understanding via Synthetic Data

verfasst von: Qi Wang, Junyu Gao, Wei Lin, Yuan Yuan

Erschienen in: International Journal of Computer Vision | Ausgabe 1/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Crowd analysis via computer vision techniques is an important topic in the field of video surveillance, which has wide-spread applications including crowd monitoring, public safety, space design and so on. Pixel-wise crowd understanding is the most fundamental task in crowd analysis because of its finer results for video sequences or still images than other analysis tasks. Unfortunately, pixel-level understanding needs a large amount of labeled training data. Annotating them is an expensive work, which causes that current crowd datasets are small. As a result, most algorithms suffer from over-fitting to varying degrees. In this paper, take crowd counting and segmentation as examples from the pixel-wise crowd understanding, we attempt to remedy these problems from two aspects, namely data and methodology. Firstly, we develop a free data collector and labeler to generate synthetic and labeled crowd scenes in a computer game, Grand Theft Auto V. Then we use it to construct a large-scale, diverse synthetic crowd dataset, which is named as “GCC Dataset”. Secondly, we propose two simple methods to improve the performance of crowd understanding via exploiting the synthetic data. To be specific, (1) supervised crowd understanding: pre-train a crowd analysis model on the synthetic data, then fine-tune it using the real data and labels, which makes the model perform better on the real world; (2) crowd understanding via domain adaptation: translate the synthetic data to photo-realistic images, then train the model on translated data and labels. As a result, the trained model works well in real crowd scenes.Extensive experiments verify that the supervision algorithm outperforms the state-of-the-art performance on four real datasets: UCF_CC_50, UCF-QNRF, and Shanghai Tech Part A/B Dataset. The above results show the effectiveness, values of synthetic GCC for the pixel-wise crowd understanding. The tools of collecting/labeling data, the proposed synthetic dataset and the source code for counting models are available at https://​gjy3035.​github.​io/​GCC-CL/​.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, P., Toderici, G., Varadarajan, B., & Vijayanarasimhan, S. (2016). Youtube-8m. A large-scale video classification benchmark. arXiv preprint arXiv:1609.08675. Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, P., Toderici, G., Varadarajan, B., & Vijayanarasimhan, S. (2016). Youtube-8m. A large-scale video classification benchmark. arXiv preprint arXiv:​1609.​08675.
Zurück zum Zitat Sam, D. B., Sajjan, N. N., Babu, R. V., & Srinivasan, M. (2018). Divide and grow capturing huge diversity in crowd images with incrementally growing CNN. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3618–3626). Sam, D. B., Sajjan, N. N., Babu, R. V., & Srinivasan, M. (2018). Divide and grow capturing huge diversity in crowd images with incrementally growing CNN. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3618–3626).
Zurück zum Zitat Bak, S., Carr, P., & Lalonde, J. F. (2018). Domain adaptation through synthesis for unsupervised person re-identification. arXiv preprint arXiv:1804.10094. Bak, S., Carr, P., & Lalonde, J. F. (2018). Domain adaptation through synthesis for unsupervised person re-identification. arXiv preprint arXiv:​1804.​10094.
Zurück zum Zitat Cao, X., Wang, Z., Zhao, Y., & Su, F. (2018). Scale aggregation network for accurate and efficient crowd counting. In Proceedings of the European conference on computer vision (pp. 734–750). Cao, X., Wang, Z., Zhao, Y., & Su, F. (2018). Scale aggregation network for accurate and efficient crowd counting. In Proceedings of the European conference on computer vision (pp. 734–750).
Zurück zum Zitat Chan, A. B., & Vasconcelos, N. (2009). Bayesian poisson regression for crowd counting. In 2009 IEEE 12th international conference on computer vision (pp. 545–551). IEEE. Chan, A. B., & Vasconcelos, N. (2009). Bayesian poisson regression for crowd counting. In 2009 IEEE 12th international conference on computer vision (pp. 545–551). IEEE.
Zurück zum Zitat Chan, A. B., Liang, Z. S. J., & Vasconcelos, N. (2008). Privacy preserving crowd monitoring: Counting people without people models or tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–7). Chan, A. B., Liang, Z. S. J., & Vasconcelos, N. (2008). Privacy preserving crowd monitoring: Counting people without people models or tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–7).
Zurück zum Zitat Chan, A. B., Morrow, M., & Vasconcelos, N, et al. (2009). Analysis of crowded scenes using holistic properties. In Performance evaluation of tracking and surveillance workshop at CVPR (pp. 101–108). Chan, A. B., Morrow, M., & Vasconcelos, N, et al. (2009). Analysis of crowded scenes using holistic properties. In Performance evaluation of tracking and surveillance workshop at CVPR (pp. 101–108).
Zurück zum Zitat Chen, K., Loy, C. C., Gong, S., & Xiang, T. (2012). Feature mining for localised crowd counting. In Proceedings of the British machine vision conference (vol. 1, p. 3). Chen, K., Loy, C. C., Gong, S., & Xiang, T. (2012). Feature mining for localised crowd counting. In Proceedings of the British machine vision conference (vol. 1, p. 3).
Zurück zum Zitat Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3213–3223). Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3213–3223).
Zurück zum Zitat Deng, J., Dong, W., Socher R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet A large-scale hierarchical image database. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 248–255). Deng, J., Dong, W., Socher R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet A large-scale hierarchical image database. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 248–255).
Zurück zum Zitat Dong, L., Parameswaran. V., Ramesh, V., & Zoghlami, I. (2007). Fast crowd segmentation using shape indexing. In 2007 IEEE 11th international conference on computer vision (pp. 1–8). IEEE. Dong, L., Parameswaran. V., Ramesh, V., & Zoghlami, I. (2007). Fast crowd segmentation using shape indexing. In 2007 IEEE 11th international conference on computer vision (pp. 1–8). IEEE.
Zurück zum Zitat Dosovitskiy, A.,Ros, G., Codevilla F., Lopez, A., & Koltun, V. (2017). CARLA: An open urban driving simulator. In Proceedings of the 1st annual conference on robot learning (pp. 1–16). Dosovitskiy, A.,Ros, G., Codevilla F., Lopez, A., & Koltun, V. (2017). CARLA: An open urban driving simulator. In Proceedings of the 1st annual conference on robot learning (pp. 1–16).
Zurück zum Zitat Everingham, M., Eslami, S. A., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International journal of computer vision, 111(1), 98–136.CrossRef Everingham, M., Eslami, S. A., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International journal of computer vision, 111(1), 98–136.CrossRef
Zurück zum Zitat Fu, M., Xu, P., Li, X., Liu, Q., Ye, M., & Zhu, C. (2015). Fast crowd density estimation with convolutional neural networks. Engineering Applications of Artificial Intelligence, 43, 81–88.CrossRef Fu, M., Xu, P., Li, X., Liu, Q., Ye, M., & Zhu, C. (2015). Fast crowd density estimation with convolutional neural networks. Engineering Applications of Artificial Intelligence, 43, 81–88.CrossRef
Zurück zum Zitat Gao, J., Lin, W., Zhao, B., Wang, D., Gao, C., & Wen, J. (2019). C\(^3\) framework. An open-source pytorch code for crowd counting. arXiv preprint arXiv:1907.02724. Gao, J., Lin, W., Zhao, B., Wang, D., Gao, C., & Wen, J. (2019). C\(^3\) framework. An open-source pytorch code for crowd counting. arXiv preprint arXiv:​1907.​02724.
Zurück zum Zitat Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Proceedings of the advances in neural information processing systems (pp. 2672–2680). Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Proceedings of the advances in neural information processing systems (pp. 2672–2680).
Zurück zum Zitat He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778). He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
Zurück zum Zitat Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, KQ. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4700–4708). Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, KQ. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4700–4708).
Zurück zum Zitat Idrees, H., Saleemi, I., Seibert, C., & Shah, M. (2013). Multi-source multi-scale counting in extremely dense crowd images. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2547–2554). Idrees, H., Saleemi, I., Seibert, C., & Shah, M. (2013). Multi-source multi-scale counting in extremely dense crowd images. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2547–2554).
Zurück zum Zitat Idrees, H., Tayyab, M., Athrey, K., Zhang, D., Al-Maadeed, S., Rajpoot, N., & Shah, M. (2018). Composition loss for counting, density map estimation and localization in dense crowds. arXiv preprint arXiv:1808.01050. Idrees, H., Tayyab, M., Athrey, K., Zhang, D., Al-Maadeed, S., Rajpoot, N., & Shah, M. (2018). Composition loss for counting, density map estimation and localization in dense crowds. arXiv preprint arXiv:​1808.​01050.
Zurück zum Zitat Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe. Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on multimedia (pp. 675–678). ACM. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe. Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on multimedia (pp. 675–678). ACM.
Zurück zum Zitat Jiang, X., Xiao, Z., Zhang, B., Zhen, X., Cao, X., Doermann, D., & Shao, L. (2019). Crowd counting and density estimation by trellis encoder-decoder networks. In The IEEE conference on computer vision and pattern recognition (CVPR) (pp. 6133–6142). Jiang, X., Xiao, Z., Zhang, B., Zhen, X., Cao, X., Doermann, D., & Shao, L. (2019). Crowd counting and density estimation by trellis encoder-decoder networks. In The IEEE conference on computer vision and pattern recognition (CVPR) (pp. 6133–6142).
Zurück zum Zitat Johnson-Roberson, M., Barto, C., Mehta, R., Sridhar, SN., Rosaen, K., & Vasudevan, R. (2017). Driving in the matrix. Can virtual worlds replace human-generated annotations for real world tasks? In: Proceedings of the IEEE international conference on robotics and automation (pp. 1–8). Johnson-Roberson, M., Barto, C., Mehta, R., Sridhar, SN., Rosaen, K., & Vasudevan, R. (2017). Driving in the matrix. Can virtual worlds replace human-generated annotations for real world tasks? In: Proceedings of the IEEE international conference on robotics and automation (pp. 1–8).
Zurück zum Zitat Kempka, M., Wydmuch, M., Runc, G., Toczek, J., & Jaśkowski, W. (2016). Vizdoom: A doom-based ai research platform for visual reinforcement learning. In 2016 IEEE conference on computational intelligence and games (CIG) (pp. 1–8). Kempka, M., Wydmuch, M., Runc, G., Toczek, J., & Jaśkowski, W. (2016). Vizdoom: A doom-based ai research platform for visual reinforcement learning. In 2016 IEEE conference on computational intelligence and games (CIG) (pp. 1–8).
Zurück zum Zitat Li, C., Lin, L., Zuo, W., Tang, J., & Yang, M. H. (2018a). Visual tracking via dynamic graph learning. IEEE transactions on pattern analysis and machine intelligence, 41(11), 2770–2782.CrossRef Li, C., Lin, L., Zuo, W., Tang, J., & Yang, M. H. (2018a). Visual tracking via dynamic graph learning. IEEE transactions on pattern analysis and machine intelligence, 41(11), 2770–2782.CrossRef
Zurück zum Zitat Li, T., Chang, H., Wang, M., Ni, B., Hong, R., & Yan, S. (2014). Crowded scene analysis: A survey. IEEE Transactions on Circuits and Systems for Video Technology, 25(3), 367–386.CrossRef Li, T., Chang, H., Wang, M., Ni, B., Hong, R., & Yan, S. (2014). Crowded scene analysis: A survey. IEEE Transactions on Circuits and Systems for Video Technology, 25(3), 367–386.CrossRef
Zurück zum Zitat Li, W., Mahadevan, V., & Vasconcelos, N. (2013). Anomaly detection and localization in crowded scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(1), 18–32. Li, W., Mahadevan, V., & Vasconcelos, N. (2013). Anomaly detection and localization in crowded scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(1), 18–32.
Zurück zum Zitat Li, X., Chen, M., Nie, F., & Wang, Q. (2017). A multiview-based parameter free framework for group detection. In 31st AAAI conference on artificial intelligence. Li, X., Chen, M., Nie, F., & Wang, Q. (2017). A multiview-based parameter free framework for group detection. In 31st AAAI conference on artificial intelligence.
Zurück zum Zitat Li, Y., Zhang, X., & Chen, D. (2018b). Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1091–1100). Li, Y., Zhang, X., & Chen, D. (2018b). Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1091–1100).
Zurück zum Zitat Lian, D., Li, J., Zheng, J., Luo, W., & Gao, S. (2019). Density map regression guided detection network for rgb-d crowd counting and localization. In The IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1821–1830). Lian, D., Li, J., Zheng, J., Luo, W., & Gao, S. (2019). Density map regression guided detection network for rgb-d crowd counting and localization. In The IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1821–1830).
Zurück zum Zitat Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco Common objects in context. In European conference on computer vision (pp. 740–755). Springer. Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco Common objects in context. In European conference on computer vision (pp. 740–755). Springer.
Zurück zum Zitat Liu, J., Gao, C., Meng, D., & Hauptmann, A. G. (2018a). Decidenet. Counting varying density crowds through attention guided detection and density estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5197–5206). Liu, J., Gao, C., Meng, D., & Hauptmann, A. G. (2018a). Decidenet. Counting varying density crowds through attention guided detection and density estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5197–5206).
Zurück zum Zitat Liu, L., Wang, H., Li, G., Ouyang, W., & Lin, L. (2018b). Crowd counting using deep recurrent spatial-aware network. arXiv preprint arXiv:1807.00601. Liu, L., Wang, H., Li, G., Ouyang, W., & Lin, L. (2018b). Crowd counting using deep recurrent spatial-aware network. arXiv preprint arXiv:​1807.​00601.
Zurück zum Zitat Liu, W., Salzmann, M., & Fua, P. (2019). Context-aware crowd counting. In The IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5099–5108). Liu, W., Salzmann, M., & Fua, P. (2019). Context-aware crowd counting. In The IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5099–5108).
Zurück zum Zitat Liu, X., van de Weijer, J., & Bagdanov, A. D. (2018c). Leveraging unlabeled data for crowd counting by learning to rank. arXiv preprint arXiv:1803.03095. Liu, X., van de Weijer, J., & Bagdanov, A. D. (2018c). Leveraging unlabeled data for crowd counting by learning to rank. arXiv preprint arXiv:​1803.​03095.
Zurück zum Zitat Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431–3440). Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431–3440).
Zurück zum Zitat Mahadevan, V., Li, W., Bhalodia, V., & Vasconcelos, N. (2010). Anomaly detection in crowded scenes. In IEEE computer society conference on computer vision and pattern recognition (pp. 1975–1981). IEEE. Mahadevan, V., Li, W., Bhalodia, V., & Vasconcelos, N. (2010). Anomaly detection in crowded scenes. In IEEE computer society conference on computer vision and pattern recognition (pp. 1975–1981). IEEE.
Zurück zum Zitat Marsden, M., McGuinness, K., Little, S., & O’Connor, NE. (2016). Fully convolutional crowd counting on highly congested scenes. arXiv preprint arXiv:1612.00220. Marsden, M., McGuinness, K., Little, S., & O’Connor, NE. (2016). Fully convolutional crowd counting on highly congested scenes. arXiv preprint arXiv:​1612.​00220.
Zurück zum Zitat Mehran, R., Oyama, A., & Shah, M. (2009). Abnormal crowd behavior detection using social force model. In 2009 IEEE conference on computer vision and pattern recognition (pp. 935–942). IEEE. Mehran, R., Oyama, A., & Shah, M. (2009). Abnormal crowd behavior detection using social force model. In 2009 IEEE conference on computer vision and pattern recognition (pp. 935–942). IEEE.
Zurück zum Zitat Onororubio, D., & Lopezsastre, R. J. (2016). Towards perspective-free object counting with deep learning (pp. 615–629). Onororubio, D., & Lopezsastre, R. J. (2016). Towards perspective-free object counting with deep learning (pp. 615–629).
Zurück zum Zitat Pan, X., Shi, J., Luo, P., Wang, X., & Tang, X. (2017). Spatial as deep. Spatial cnn for traffic scene understanding. arXiv preprint arXiv:1712.06080. Pan, X., Shi, J., Luo, P., Wang, X., & Tang, X. (2017). Spatial as deep. Spatial cnn for traffic scene understanding. arXiv preprint arXiv:​1712.​06080.
Zurück zum Zitat Paszke, A., Gross, S., Chintala, S., & Chanan, G. (2017). Pytorch: Tensors and dynamic neural networks in python with strong gpu acceleration. Paszke, A., Gross, S., Chintala, S., & Chanan, G. (2017). Pytorch: Tensors and dynamic neural networks in python with strong gpu acceleration.
Zurück zum Zitat Popoola, O. P., & Wang, K. (2012). Video-based abnormal human behavior recognition-a review. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6), 865–878. Popoola, O. P., & Wang, K. (2012). Video-based abnormal human behavior recognition-a review. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6), 865–878.
Zurück zum Zitat Qiu, W., Zhong, F., Zhang, Y., Qiao, S., Xiao, Z., Kim, T. S., et al. (2017). Unrealcv. ACM Multimedia Open Source Software Competition: Virtual worlds for computer vision.CrossRef Qiu, W., Zhong, F., Zhang, Y., Qiao, S., Xiao, Z., Kim, T. S., et al. (2017). Unrealcv. ACM Multimedia Open Source Software Competition: Virtual worlds for computer vision.CrossRef
Zurück zum Zitat Richter, S. R., Vineet, V., Roth, S., & Koltun, V. (2016). Playing for data: Ground truth from computer games. In Proceedings of the European conference on computer vision (pp. 102–118). Richter, S. R., Vineet, V., Roth, S., & Koltun, V. (2016). Playing for data: Ground truth from computer games. In Proceedings of the European conference on computer vision (pp. 102–118).
Zurück zum Zitat Richter, S. R., Hayder, Z., & Koltun, V. (2017). Playing for benchmarks. In Proceedings of the international conference on computer vision (Vol. 2). Richter, S. R., Hayder, Z., & Koltun, V. (2017). Playing for benchmarks. In Proceedings of the international conference on computer vision (Vol. 2).
Zurück zum Zitat Ros, G., Sellart, L., Materzynska, J., Vazquez, D., & Lopez, A. M. (2016). The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3234–3243). Ros, G., Sellart, L., Materzynska, J., Vazquez, D., & Lopez, A. M. (2016). The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3234–3243).
Zurück zum Zitat Sam, D. B., Surya, S., & Babu, R. V. (2017). Switching convolutional neural network for crowd counting. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. 1, p. 6). Sam, D. B., Surya, S., & Babu, R. V. (2017). Switching convolutional neural network for crowd counting. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. 1, p. 6).
Zurück zum Zitat Sam, D. B., Sajjan, N. N., & Babu, R. V. (2018). Divide and grow Capturing huge diversity in crowd images with incrementally growing cnn. arXiv preprint arXiv:1807.09993. Sam, D. B., Sajjan, N. N., & Babu, R. V. (2018). Divide and grow Capturing huge diversity in crowd images with incrementally growing cnn. arXiv preprint arXiv:​1807.​09993.
Zurück zum Zitat Sam, D. B., Sajjan, N. N., Maurya, H., & Babu, R. V. (2019). Almost unsupervised learning for dense crowd counting. In Proceedings of the 33rd AAAI conference on artificial intelligence, Honolulu, HI, USA (Vol. 27). Sam, D. B., Sajjan, N. N., Maurya, H., & Babu, R. V. (2019). Almost unsupervised learning for dense crowd counting. In Proceedings of the 33rd AAAI conference on artificial intelligence, Honolulu, HI, USA (Vol. 27).
Zurück zum Zitat Shah, S., Dey, D., Lovett, C., & Kapoor, A. (2018). Airsim High-fidelity visual and physical simulation for autonomous vehicles. InField and service robotics (pp. 621–635). Springer. Shah, S., Dey, D., Lovett, C., & Kapoor, A. (2018). Airsim High-fidelity visual and physical simulation for autonomous vehicles. InField and service robotics (pp. 621–635). Springer.
Zurück zum Zitat Shen, Z., Xu, Y., Ni, B., Wang, M., Hu, J., & Yang, X. (2018). Crowd counting via adversarial cross-scale consistency pursuit. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5245–5254). Shen, Z., Xu, Y., Ni, B., Wang, M., Hu, J., & Yang, X. (2018). Crowd counting via adversarial cross-scale consistency pursuit. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5245–5254).
Zurück zum Zitat Shi, Z., Zhang, L., Liu, Y., Cao, X., Ye, Y., Cheng, M. M., & Zheng, G. (2018). Crowd counting with deep negative correlation learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5382–5390). Shi, Z., Zhang, L., Liu, Y., Cao, X., Ye, Y., Cheng, M. M., & Zheng, G. (2018). Crowd counting with deep negative correlation learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5382–5390).
Zurück zum Zitat Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:​1409.​1556.
Zurück zum Zitat Sindagi, V. A., & Patel, V. M. (2017a). Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In Proceedings of the IEEE international conference on advanced video and signal based surveillance (pp. 1–6). Sindagi, V. A., & Patel, V. M. (2017a). Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In Proceedings of the IEEE international conference on advanced video and signal based surveillance (pp. 1–6).
Zurück zum Zitat Sindagi, V. A., & Patel, V. M. (2017b). Generating high-quality crowd density maps using contextual pyramid cnns. In Proceedings of the IEEE international conference on computer vision (pp. 1879–1888). Sindagi, V. A., & Patel, V. M. (2017b). Generating high-quality crowd density maps using contextual pyramid cnns. In Proceedings of the IEEE international conference on computer vision (pp. 1879–1888).
Zurück zum Zitat Sindagi, V. A., Yasarla, R., & Patel, V. M. (2019). Pushing the frontiers of unconstrained crowd counting: New dataset and benchmark method. In Proceedings of the IEEE international conference on computer vision (pp. 1221–1231). Sindagi, V. A., Yasarla, R., & Patel, V. M. (2019). Pushing the frontiers of unconstrained crowd counting: New dataset and benchmark method. In Proceedings of the IEEE international conference on computer vision (pp. 1221–1231).
Zurück zum Zitat Sindagi, V. A., Yasarla, R., & Patel, V. M. (2020). Jhu-crowd++. Technical Report: Large-scale crowd counting dataset and a benchmark method. Sindagi, V. A., Yasarla, R., & Patel, V. M. (2020). Jhu-crowd++. Technical Report: Large-scale crowd counting dataset and a benchmark method.
Zurück zum Zitat Walach, E., & Wolf, L. (2016). Learning to count with cnn boosting (pp. 660–676). Walach, E., & Wolf, L. (2016). Learning to count with cnn boosting (pp. 660–676).
Zurück zum Zitat Wan, J., Luo, W., Wu, B., Chan, A. B., & Liu, W. (2019). Residual regression with semantic prior for crowd counting. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4036–4045). Wan, J., Luo, W., Wu, B., Chan, A. B., & Liu, W. (2019). Residual regression with semantic prior for crowd counting. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4036–4045).
Zurück zum Zitat Wang, C., Zhang, H., Yang, L., Liu, S., & Cao, X. (2015). Deep people counting in extremely dense crowds. In Proceedings of the 23rd ACM international conference on multimedia (pp. 1299–1302). ACM. Wang, C., Zhang, H., Yang, L., Liu, S., & Cao, X. (2015). Deep people counting in extremely dense crowds. In Proceedings of the 23rd ACM international conference on multimedia (pp. 1299–1302). ACM.
Zurück zum Zitat Wang, Q., Wan, J., & Yuan, Y. (2018b). Deep metric learning for crowdedness regression. IEEE Transactions on Circuits and Systems for Video Technology, 28(10), 2633–2643.CrossRef Wang, Q., Wan, J., & Yuan, Y. (2018b). Deep metric learning for crowdedness regression. IEEE Transactions on Circuits and Systems for Video Technology, 28(10), 2633–2643.CrossRef
Zurück zum Zitat Wang, Q., Gao, J., Lin, W., & Yuan, Y. (2019). Learning from synthetic data for crowd counting in the wild. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 8198–8207). Wang, Q., Gao, J., Lin, W., & Yuan, Y. (2019). Learning from synthetic data for crowd counting in the wild. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 8198–8207).
Zurück zum Zitat Wang, Q., Gao, J., Lin, W., & Li, X. (2020). Nwpu-crowd A large-scale benchmark for crowd counting. arXiv preprint arXiv:2001.03360. Wang, Q., Gao, J., Lin, W., & Li, X. (2020). Nwpu-crowd A large-scale benchmark for crowd counting. arXiv preprint arXiv:​2001.​03360.
Zurück zum Zitat Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612.CrossRef Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612.CrossRef
Zurück zum Zitat Xiong, F., Shi, X., & Yeung, D. Y. (2017). Spatiotemporal modeling for crowd counting in videos. arXiv preprint arXiv:1707.07890. Xiong, F., Shi, X., & Yeung, D. Y. (2017). Spatiotemporal modeling for crowd counting in videos. arXiv preprint arXiv:​1707.​07890.
Zurück zum Zitat Yan, Z., Yuan, Y., Zuo, W., Tan, X., Wang, Y., Wen, S., & Ding, E. (2019). Perspective-guided convolution networks for crowd counting. In Proceedings of the IEEE international conference on computer vision (pp. 952–961). Yan, Z., Yuan, Y., Zuo, W., Tan, X., Wang, Y., Wen, S., & Ding, E. (2019). Perspective-guided convolution networks for crowd counting. In Proceedings of the IEEE international conference on computer vision (pp. 952–961).
Zurück zum Zitat Yuan, Y., Fang, J., & Wang, Q. (2014). Online anomaly detection in crowd scenes via structure analysis. IEEE Transactions on Cybernetics, 45(3), 548–561.CrossRef Yuan, Y., Fang, J., & Wang, Q. (2014). Online anomaly detection in crowd scenes via structure analysis. IEEE Transactions on Cybernetics, 45(3), 548–561.CrossRef
Zurück zum Zitat Zhang, C., Kang, K., Li, H., Wang, X., Xie, R., & Yang, X. (2016a). Data-driven crowd understanding: A baseline for a large-scale crowd dataset. IEEE Transactions on Multimedia, 18(6), 1048–1061.CrossRef Zhang, C., Kang, K., Li, H., Wang, X., Xie, R., & Yang, X. (2016a). Data-driven crowd understanding: A baseline for a large-scale crowd dataset. IEEE Transactions on Multimedia, 18(6), 1048–1061.CrossRef
Zurück zum Zitat Zhang, Y., Zhou, D., Chen, S., Gao, S., & Ma, Y. (2016b). Single-image crowd counting via multi-column convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 589–597). Zhang, Y., Zhou, D., Chen, S., Gao, S., & Ma, Y. (2016b). Single-image crowd counting via multi-column convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 589–597).
Zurück zum Zitat Zhao, M., Zhang, J., Zhang, C., & Zhang, W. (2019). Leveraging heterogeneous auxiliary tasks to assist crowd counting. In The IEEE conference on computer vision and pattern recognition (CVPR) (pp. 12736–12745). Zhao, M., Zhang, J., Zhang, C., & Zhang, W. (2019). Leveraging heterogeneous auxiliary tasks to assist crowd counting. In The IEEE conference on computer vision and pattern recognition (CVPR) (pp. 12736–12745).
Zurück zum Zitat Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593. Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv:​1703.​10593.
Zurück zum Zitat Zuo, W., Wu, X., Lin, L., Zhang, L., & Yang, M. H. (2018). Learning support correlation filters for visual tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(5), 1158–1172.CrossRef Zuo, W., Wu, X., Lin, L., Zhang, L., & Yang, M. H. (2018). Learning support correlation filters for visual tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(5), 1158–1172.CrossRef
Metadaten
Titel
Pixel-Wise Crowd Understanding via Synthetic Data
verfasst von
Qi Wang
Junyu Gao
Wei Lin
Yuan Yuan
Publikationsdatum
30.08.2020
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 1/2021
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-020-01365-4

Weitere Artikel der Ausgabe 1/2021

International Journal of Computer Vision 1/2021 Zur Ausgabe